Abstract
Since the development of sensor network and Internet of Things, the volume of data is rapidly increasing and the streaming data has attracted much attention recently. To efficiently process and explore data streams, the compact data representation is playing an important role, since the data approximations other than the original data items are usually applied in many stream mining tasks, such as clustering, classification, and correlation analysis. In this chapter, we focus on the maximum error-bounded approximation of data stream, which represents the streaming data with constrained approximation error on each data point. There are two criteria for the approximation solution: self-adaption over time for varied error bound and real-time processing. We reviewed the existing data approximation techniques and summarized some essential theories such as optimization guarantee. Two optimal linear-time algorithms are introduced to construct error-bounded piecewise linear representation for data stream. One generates the line segments by data convex analysis, and the other one is based on the transformed space, which can be extended to a general model. We theoretically analyzed and compared these two different spaces, and proved the theoretical equivalence between them, as well as the two algorithms.
Original language | English (US) |
---|---|
Title of host publication | Learning from Data Streams in Evolving Environments |
Publisher | Springer Nature |
Pages | 93-122 |
Number of pages | 30 |
ISBN (Print) | 9783319898025 |
DOIs | |
State | Published - Jul 29 2018 |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledgements: This work is partially supported by Natural Science Foundation of China (Grant No. 61602353), Natural Science Foundation of Hubei Province (Grant No. 2017CFB505), and the Fundamental Research Funds for the Central Universities (Grant No. WUT:2017IVA053 and WUT:2017IVB028).