The tremendous increase in the semantic data is driving the demand for efficient query engines. RDF data being generated at an unprecedented rate introduces a storage, indexing, and querying challenge. Due to the size of the data and the federated nature of the semantic web, it is in many cases impractical to assume a central repository, and more attention is being given to distributed RDF stores. This work is motivated by two major drawbacks of current solutions: 1) pre-processing part is very expensive and takes prohibitively long time for large datasets, and 2) current distributed systems assume that a static partitioning of the data should perform well for all kinds of queries, and do not consider fluctuations in the queryload.
In this paper we propose PHD-Store, an in-memory SPARQL engine for distributed RDF repositories. Our system does not assume any particular initial placement of the data and does not require pre-processing before running the first query. It analyzes incoming queries and adjusts data placement dynamically in such a way that communication among repositories is minimized for future queries. To achieve this flexibility, frequent query patterns are detected, and data are redistributed through a Propagating Hash Distribution (PHD) algorithm to ensure optimal placement for frequent query patterns. Our experiments with large RDF graphs verify that PHD-Store scales well and executes complex queries more efficiently than existing systems.
|Date of Award||Jul 2012|
|Original language||English (US)|
- Computer, Electrical and Mathematical Sciences and Engineering
|Supervisor||Panos Kalnis (Supervisor)|