Access the full text.
Sign up today, get DeepDyve free for 14 days.
Pedro Domingos, Geoff Hulten (2000)
Mining high-speed data streams
Geoff Hulten, Pedro Domingos (2002)
Mining complex models from arbitrarily large databases in constant timeProceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
W. Hoeffding (1963)
Probability inequalities for sum of bounded random variables
Pedro Domingos, Geoff Hulten (2001)
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
Pedro Domingos, Geoff Hulten (2001)
Learning from Infinite Data in Finite Time
Geoff Hulten, Laurie Spencer, Pedro Domingos (2001)
Mining time-changing data streams
In many domains, data now arrive faster than we are able to mine it. To avoid wasting these data, we must switch from the traditional “one-shot” data mining approach to systems that are able to mine continuous, high-volume, open-ended data streams as they arrive. In this article we identify some desiderata for such systems, and outline our framework for realizing them. A key property of our approach is that it minimizes the time required to build a model on a stream while guaranteeing (as long as the data are iid) that the model learned is effectively indistinguishable from the one that would be obtained using infinite data. Using this framework, we have successfully adapted several learning algorithms to massive data streams, including decision tree induction, Bayesian network learning, k-means clustering, and the EM algorithm for mixtures of Gaussians. These algorithms are able to process on the order of billions of examples per day using off-the-shelf hardware. Building on this, we are currently developing software primitives for scaling arbitrary learning algorithms to massive data streams with minimal effort.
Journal of Computational and Graphical Statistics – Taylor & Francis
Published: Dec 1, 2003
Keywords: Data mining; Hoeffding bounds; Machine learning; Scalability; Subsampling
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.