Home
Got a new data mining algorithm ? Wrote a new data mining package ? Try it out on these publicly
available datasets.
Check out more information about these books here!
-->
| Poll: Which of the following would you recommend as the best introductory book on data mining? | |
Check out more information about these books here!
-->
- UCI KDD Database Repository -- the most popular site for datasets used for research in machine learning and knowledge discovery.
- Delve, Data for Evaluating Learning in Valid Experiments
- FEDSTATS, a comprehensive source of US statistics and more
- FIMI repository for frequent itemset mining, implementations and datasets.
- Financial Data Finder at OSU, a large catalog of financial data sets
- Grain Market Research, financial data including stocks, futures, etc.
- Investor Links, includes financial data
- Microsoft's TerraServer, aerial photographs and satellite images you can view and purchase.
- MIT Cancer Genomics gene expression datasets and publications, from MIT Whitehead Center for Genome Research.
- MLnet (European Machine Learning Network) list of Datasets.
- National Space Science Data Center (NSSDC), NASA data sets from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
- PubGene(TM) Gene Database and Tools, genomic-related publications database
- SMD: Stanford Microarray Database, stores raw and normalized data from micro-array experiments.
- Clustering Dataset Also, check out their Time Series Data
- STATLOG project datasets. This project did comparative studies of different machine learning, neural and statistical classification algorithms. About 20 different algorithms were evaluated on more than 20 different datasets.
- STATOO Datasets part 1 and part 2
- UCR Time Series Data Mining Archive, offering datasets, papers, links, and code.
- United States Census Bureau
- Dataset Generator
- KDD Sisyphus A large, un-preprocessed, multi-relational and partially documented database extract.
- Subscribers of a wireless phone company for use in survival analysis exercises as a tab-delimited text file
- Catalog responders as a tab-delimited text file
- Traces available in internet traffic archive Link
- Microsoft anonymous web data Link