Abstracting massive data for lightweight intrusion detection in computer networks

Wei Wang, Jiqiang Liu, Georgios Pitsilis, Xiangliang Zhang

Research output: Contribution to journalArticlepeer-review

56 Scopus citations


Anomaly intrusion detection in big data environments calls for lightweight models that are able to achieve real-time performance during detection. Abstracting audit data provides a solution to improve the efficiency of data processing in intrusion detection. Data abstraction refers to abstract or extract the most relevant information from the massive dataset. In this work, we propose three strategies of data abstraction, namely, exemplar extraction, attribute selection and attribute abstraction. We first propose an effective method called exemplar extraction to extract representative subsets from the original massive data prior to building the detection models. Two clustering algorithms, Affinity Propagation (AP) and traditional . k-means, are employed to find the exemplars from the audit data. . k-Nearest Neighbor (k-NN), Principal Component Analysis (PCA) and one-class Support Vector Machine (SVM) are used for the detection. We then employ another two strategies, attribute selection and attribute extraction, to abstract audit data for anomaly intrusion detection. Two http streams collected from a real computing environment as well as the KDD'99 benchmark data set are used to validate these three strategies of data abstraction. The comprehensive experimental results show that while all the three strategies improve the detection efficiency, the AP-based exemplar extraction achieves the best performance of data abstraction.
Original languageEnglish (US)
Pages (from-to)417-430
Number of pages14
JournalInformation Sciences
StatePublished - Oct 15 2016

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: Ministry of Education of the People's Republic of China[K14C300020]


Dive into the research topics of 'Abstracting massive data for lightweight intrusion detection in computer networks'. Together they form a unique fingerprint.

Cite this