Abstract
Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or high communication and synchronization overheads. In this paper we propose ScaleMine; a novel parallel frequent subgraph mining system for a single large graph. ScaleMine introduces a novel two-phase approach. The first phase is approximate; it quickly identifies subgraphs that are frequent with high probability, while collecting various statistics. The second phase computes the exact solution by employing the results of the approximation to achieve good load balance; prune the search space; generate efficient execution plans; and guide intra-task parallelism. Our experiments show that ScaleMine scales to 8,192 cores on a Cray XC40 (12× more than competitors); supports graphs with one billion edges (10× larger than competitors), and is at least an order of magnitude faster than existing solutions.
Original language | English (US) |
---|---|
Title of host publication | SC16: International Conference for High Performance Computing, Networking, Storage and Analysis |
Subtitle of host publication | The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | IEEE Computer Society |
Pages | 716-727 |
Number of pages | 12 |
ISBN (Electronic) | 9781467388153 |
ISBN (Print) | 9781467388153 |
DOIs | |
State | Published - Mar 13 2017 |
Event | 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016 - Salt Lake City, United States Duration: Nov 13 2016 → Nov 18 2016 |
Publication series
Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
---|---|
ISSN (Print) | 2167-4329 |
ISSN (Electronic) | 2167-4337 |
Other
Other | 2016 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016 |
---|---|
Country/Territory | United States |
City | Salt Lake City |
Period | 11/13/16 → 11/18/16 |
Bibliographical note
KAUST Repository Item: Exported on 2021-08-20Acknowledgements: For computer time, this research used the resources of the Supercomputing Laboratory at King Abdullah University of Science & Technology (KAUST) in Thuwal, Saudi Arabia.
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Hardware and Architecture
- Software