TY - CHAP
T1 - Querying and Mining Strings Made Easy
AU - Sahli, Majed
AU - Mansour, Essam
AU - Kalnis, Panos
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2017/10/14
Y1 - 2017/10/14
N2 - With the advent of large string datasets in several scientific and business applications, there is a growing need to perform ad-hoc analysis on strings. Currently, strings are stored, managed, and queried using procedural codes. This limits users to certain operations supported by existing procedural applications and requires manual query planning with limited tuning opportunities. This paper presents StarQL, a generic and declarative query language for strings. StarQL is based on a native string data model that allows StarQL to support a large variety of string operations and provide semantic-based query optimization. String analytic queries are too intricate to be solved on one machine. Therefore, we propose a scalable and efficient data structure that allows StarQL implementations to handle large sets of strings and utilize large computing infrastructures. Our evaluation shows that StarQL is able to express workloads of application-specific tools, such as BLAST and KAT in bioinformatics, and to mine Wikipedia text for interesting patterns using declarative queries. Furthermore, the StarQL query optimizer shows an order of magnitude reduction in query execution time.
AB - With the advent of large string datasets in several scientific and business applications, there is a growing need to perform ad-hoc analysis on strings. Currently, strings are stored, managed, and queried using procedural codes. This limits users to certain operations supported by existing procedural applications and requires manual query planning with limited tuning opportunities. This paper presents StarQL, a generic and declarative query language for strings. StarQL is based on a native string data model that allows StarQL to support a large variety of string operations and provide semantic-based query optimization. String analytic queries are too intricate to be solved on one machine. Therefore, we propose a scalable and efficient data structure that allows StarQL implementations to handle large sets of strings and utilize large computing infrastructures. Our evaluation shows that StarQL is able to express workloads of application-specific tools, such as BLAST and KAT in bioinformatics, and to mine Wikipedia text for interesting patterns using declarative queries. Furthermore, the StarQL query optimizer shows an order of magnitude reduction in query execution time.
UR - http://hdl.handle.net/10754/626127
UR - https://link.springer.com/chapter/10.1007%2F978-3-319-69179-4_1
UR - http://www.scopus.com/inward/record.url?scp=85033668417&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-69179-4_1
DO - 10.1007/978-3-319-69179-4_1
M3 - Chapter
SN - 9783319691787
SP - 3
EP - 17
BT - Advanced Data Mining and Applications
PB - Springer Nature
ER -