TY - GEN
T1 - TopPPR
AU - Wei, Zhewei
AU - He, Xiaodong
AU - Xiao, Xiaokui
AU - Wang, Sibo
AU - Shang, Shuo
AU - Wen, Ji-Rong
N1 - KAUST Repository Item: Exported on 2021-02-19
Acknowledgements: This research was supported in part by National Natural Science Foundation of China (No. 61502503), by the National Key Basic Research Program (973 Program) of China (No. 2014CB340403), by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China under Grant 18XNLG21, by MOE, Singapore under grant MOE2015-T2-2-069, and by NUS, Singapore under an SUG.
PY - 2018/5/25
Y1 - 2018/5/25
N2 - Personalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with respect to s. This type of queries serves as an important building block for numerous applications in web search and social networks, such as Twitter's Who-To-Follow recommendation service. Existing techniques for top-k PPR, however, suffer from two major deficiencies. First, they either incur prohibitive space and time overheads on large graphs, or fail to provide any guarantee on the precision of top-k results (i.e., the results returned might miss a number of actual top-k answers). Second, most of them require significant pre-computation on the input graph G, which renders them unsuitable for graphs with frequent updates (e.g., Twitter's social graph). To address the deficiencies of existing solutions, we propose TopPPR, an algorithm for top-k PPR queries that ensure at least ? precision (i.e., at least ? fraction of the actual top-k results are returned) with at least 1-1/n probability, where ? ? (0, 1] is a userspecified parameter and n is the number of nodes in G. In addition, TopPPR offers non-trivial guarantees on query time in terms of ?, and it can easily handle dynamic graphs as it does not require any preprocessing. We experimentally evaluate TopPPR using a variety of benchmark datasets, and demonstrate that TopPPR outperforms the state-of-the-art solutions in terms of both efficiency and precision, even when we set ? = 1 (i.e., when TopPPR returns the exacttop-k results). Notably, on a billion-edge Twitter graph, TopPPR only requires 15 seconds to answer a top-500 PPR query with ? = 1.
AB - Personalized PageRank (PPR) is a classic metric that measures the relevance of graph nodes with respect to a source node. Given a graph G, a source node s, and a parameter k, a top-k PPR query returns a set of k nodes with the highest PPR values with respect to s. This type of queries serves as an important building block for numerous applications in web search and social networks, such as Twitter's Who-To-Follow recommendation service. Existing techniques for top-k PPR, however, suffer from two major deficiencies. First, they either incur prohibitive space and time overheads on large graphs, or fail to provide any guarantee on the precision of top-k results (i.e., the results returned might miss a number of actual top-k answers). Second, most of them require significant pre-computation on the input graph G, which renders them unsuitable for graphs with frequent updates (e.g., Twitter's social graph). To address the deficiencies of existing solutions, we propose TopPPR, an algorithm for top-k PPR queries that ensure at least ? precision (i.e., at least ? fraction of the actual top-k results are returned) with at least 1-1/n probability, where ? ? (0, 1] is a userspecified parameter and n is the number of nodes in G. In addition, TopPPR offers non-trivial guarantees on query time in terms of ?, and it can easily handle dynamic graphs as it does not require any preprocessing. We experimentally evaluate TopPPR using a variety of benchmark datasets, and demonstrate that TopPPR outperforms the state-of-the-art solutions in terms of both efficiency and precision, even when we set ? = 1 (i.e., when TopPPR returns the exacttop-k results). Notably, on a billion-edge Twitter graph, TopPPR only requires 15 seconds to answer a top-500 PPR query with ? = 1.
UR - http://hdl.handle.net/10754/628404
UR - https://dl.acm.org/citation.cfm?doid=3183713.3196920
UR - http://www.scopus.com/inward/record.url?scp=85048753947&partnerID=8YFLogxK
U2 - 10.1145/3183713.3196920
DO - 10.1145/3183713.3196920
M3 - Conference contribution
AN - SCOPUS:85048753947
SN - 9781450347037
SP - 441
EP - 456
BT - Proceedings of the 2018 International Conference on Management of Data - SIGMOD '18
PB - Association for Computing Machinery (ACM)
ER -