TY - JOUR
T1 - Abstracting massive data for lightweight intrusion detection in computer networks
AU - Wang, Wei
AU - Liu, Jiqiang
AU - Pitsilis, Georgios
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: Ministry of Education of the People's Republic of China[K14C300020]
PY - 2016/10/15
Y1 - 2016/10/15
N2 - Anomaly intrusion detection in big data environments calls for lightweight models that are able to achieve real-time performance during detection. Abstracting audit data provides a solution to improve the efficiency of data processing in intrusion detection. Data abstraction refers to abstract or extract the most relevant information from the massive dataset. In this work, we propose three strategies of data abstraction, namely, exemplar extraction, attribute selection and attribute abstraction. We first propose an effective method called exemplar extraction to extract representative subsets from the original massive data prior to building the detection models. Two clustering algorithms, Affinity Propagation (AP) and traditional . k-means, are employed to find the exemplars from the audit data. . k-Nearest Neighbor (k-NN), Principal Component Analysis (PCA) and one-class Support Vector Machine (SVM) are used for the detection. We then employ another two strategies, attribute selection and attribute extraction, to abstract audit data for anomaly intrusion detection. Two http streams collected from a real computing environment as well as the KDD'99 benchmark data set are used to validate these three strategies of data abstraction. The comprehensive experimental results show that while all the three strategies improve the detection efficiency, the AP-based exemplar extraction achieves the best performance of data abstraction.
AB - Anomaly intrusion detection in big data environments calls for lightweight models that are able to achieve real-time performance during detection. Abstracting audit data provides a solution to improve the efficiency of data processing in intrusion detection. Data abstraction refers to abstract or extract the most relevant information from the massive dataset. In this work, we propose three strategies of data abstraction, namely, exemplar extraction, attribute selection and attribute abstraction. We first propose an effective method called exemplar extraction to extract representative subsets from the original massive data prior to building the detection models. Two clustering algorithms, Affinity Propagation (AP) and traditional . k-means, are employed to find the exemplars from the audit data. . k-Nearest Neighbor (k-NN), Principal Component Analysis (PCA) and one-class Support Vector Machine (SVM) are used for the detection. We then employ another two strategies, attribute selection and attribute extraction, to abstract audit data for anomaly intrusion detection. Two http streams collected from a real computing environment as well as the KDD'99 benchmark data set are used to validate these three strategies of data abstraction. The comprehensive experimental results show that while all the three strategies improve the detection efficiency, the AP-based exemplar extraction achieves the best performance of data abstraction.
UR - http://hdl.handle.net/10754/622270
UR - https://linkinghub.elsevier.com/retrieve/pii/S0020025516312385
UR - http://www.scopus.com/inward/record.url?scp=85005949713&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2016.10.023
DO - 10.1016/j.ins.2016.10.023
M3 - Article
SN - 0020-0255
VL - 433-434
SP - 417
EP - 430
JO - Information Sciences
JF - Information Sciences
ER -