Abstract
Recently, techniques for supporting efficient similarity search over huge transaction datasets have emerged as an important research area. Several indexing schemes have been proposed towards this direction. Typically, these schemes provide a tradeoff between searching efficiency and indexing overhead in terms of space. In this paper, we propose a novel indexing scheme for similarity search on transaction data. Based on well-studied clustering techniques, we develop a construction algorithm for the proposed index and a branch-and-bound searching strategy for answering similarity search. Unlike previous techniques, our indexing scheme exhibits high search efficiency and low space requirements by trading-off the pre-computation time. This behavior is ideal for applications with low update but high read volume (e.g., data warehousing, collaborative filtering, etc.). Moreover, our experimental results illustrate that our method is robust to the varying characteristics of the datasets.
Original language | English (US) |
---|---|
Pages | 314-323 |
Number of pages | 10 |
State | Published - 2004 |
Externally published | Yes |
Event | CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management - Washington, DC, United States Duration: Nov 8 2004 → Nov 13 2004 |
Other
Other | CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management |
---|---|
Country/Territory | United States |
City | Washington, DC |
Period | 11/8/04 → 11/13/04 |
Keywords
- Data Mining
- Indexing
- Similarity Search
- Transaction Data
ASJC Scopus subject areas
- General Decision Sciences
- General Business, Management and Accounting