Abstract
Recently, many applications such as network monitoring, traffic management and environmental studies generate huge amount of data that cannot fit in the computer memory. Data of such applications arrive continuously in the form of streams. The main challenges for mining data streams are the high speed and the large volume of the arriving data. A typical solution to tackle the problems of mining data streams is to learn a model that fits in the computer memory. However, the underlying distributions of the streaming data change over time in unpredicted scenarios. In this sense, the learned models should be updated continuously and rely more on the most recent data in the streams.
\n
\nIn this chapter, we present an online density estimator that builds a model called KDE-Track for characterizing the dynamic density of the data streams. KDE-Track summarizes the distribution of a data stream by estimating the Probability Density Function (PDF) of the stream at a set of resampling points. KDE-Track is shown to be more accurate (as reflected by smaller error values) and more computationally efficient (as reflected by shorter running time) when compared with existing density estimation techniques. We demonstrate the usefulness of KDE-Track in visualizing the dynamic density of data streams and change detection.
Original language | English (US) |
---|---|
Title of host publication | Learning from Data Streams in Evolving Environments |
Publisher | Springer Nature |
Pages | 247-278 |
Number of pages | 32 |
ISBN (Print) | 9783319898025 |
DOIs | |
State | Published - Jul 29 2018 |