Data Monetization: A Road Paved On Top Of Data Sets

Paving Road Construction Sign Royalty Free Clipart Picture 090626 203307 625048The road to efficient data monetization is paved on top of effective data sets. No single source of data is comprehensive enough to be an all being source of transformational insights. It is only through the fusion of orthogonal data sets (independent subject area) that true insights into those thing we don’t know we don’t know (level three knowledge) can be revealed. While we have access to data of interest (ERPs, IT, etc.), where can we find others sources to aid in the third level knowledge spelunking? 

NewImageWhile data is everywhere, useful data sets are not. A google search on terms like “open data sets” or “data sets in R” reveal thousands of sources. Over the years as a CTO and Data Scientist, I have collected a few hundred myself. In 2011, however, I came across the work of RevoJoe, Revolution Analytics, that more or less got me organized in this area. So here are a few data sets from my list that I maintain today:

Commercial Sources
Data MarketPlace: http://www.infochimps.com/marketplace

Economics
UMD:: http://inforumweb.umd.edu/econdata/econdata.html
World bank: http://data.worldbank.org/indicator

Finance
CBOE Futures Exchange: http://cfe.cboe.com/Data/
Gapminder: http://www.gapminder.org
Google Finance: http://finance.yahoo.com/ (R)
Google Trends: http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0
St Louis Fed: http://research.stlouisfed.org/fred2/ (R)
NASDAQ: https://data.nasdaq.com/
OANDA: http://www.oanda.com/ (R)
Yahoo Finance: http://finance.yahoo.com/ (R)

Government
Archived national government statistics: http://www.archive-it.org/
Australia: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3301.02009?OpenDocument
Canada: http://www.data.gc.ca/default.asp?lang=En&n=5BCD274E-1
Civic Commons: http://wiki.civiccommons.org/Initiatives
DataMarket: http://datamarket.com/
Datamob: http://datamob.org
Fed Stats: http://www.fedstats.gov/cgi-bin/A2Z.cgi
Guardian world governments: http://www.guardian.co.uk/world-government-data
List of cities/states by Simply Statitistics: http://simplystatistics.org/2012/01/02/list-of-cities-states-with-open-data-help-me-find/
London, U.K. data: http://data.london.gov.uk/catalogue
New Zealand: http://www.stats.govt.nz/tools_and_services/tools/TableBuilder/tables-by…
NYC data: http://nycplatform.socrata.com/
Open Government Data (Hub): http://opengovernmentdata.org
Open Government Data – United States of America: http://www.data.gov
Open Government Data – United Kingdom: http://data.gov.uk
Open Government – France: http://www.data.gouv.fr
OECD: http://www.oecd.org/document/0,3746,en_2649_201185_46462759_1_1_1_1,00.html
San Francisco Data sets: http://datasf.org/
U.K. Government Data:http://data.gov.uk/data
United Nations: http://data.un.org/
U.S. Federal Government Agencies: http://www.data.gov/metric
US CDC Public Health datasets: http://www.cdc.gov/nchs/data_access/ftp_data.htm
The World Bank: http://wdronline.worldbank.org/

Machine Learning
Causality Workbench: http://www.causality.inf.ethz.ch/repository.php
Kaggle competition data: http://www.kaggle.com/
KDNuggets competition site: http://www.kdnuggets.com/datasets/
UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/
Machine Learning Data Set Repository: http://mldata.org/
Microsoft Research: http://research.microsoft.com/apps/dp/dl/downloads.aspx
Million songs: http://blog.echonest.com/post/3639160982/million-song-dataset
Social Networking: http://www.cs.cmu.edu/~jelsas/data/ancestry.com/
The Koblenz Network Collection: http://konect.uni-koblenz.de/

Miscellaneous
Datasets: http://www.reddit.com/r/datasets
Datasets: http://www.reddit.com/r/opendata/
Hilary Mason’s research data (Chief Data Scientist at Bit.ly): http://bitly.com/bundles/hmason/1
Kaggle Contests: http://www.kaggle.com/
R Datasets: http://vincentarelbundock.github.com/Rdatasets/datasets.html

Public Domain Collections
Data360: http://www.data360.org/index.aspx
Datamob.org: http://datamob.org/datasets
Factual: http://www.factual.com/topics/browse
Freebase: http://www.freebase.com/
Google: http://www.google.com/publicdata/directory
infochimps: http://www.infochimps.com/
numbray: http://numbrary.com/
Sample R data sets: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html (R)
SourceForge Research Data: http://www.nd.edu/~oss/Data/data.html
UFO Reports: http://www.nuforc.org/webreports.html
Wikileaks 911 pager intercepts: http://911.wikileaks.org/files/index.html
Stats4Stem.org: R data sets: http://www.stats4stem.org/data-sets.html (R)
The Washington Post List: http://www.washingtonpost.com/wp-srv/metro/data/datapost.html

Science
Agricultural Experiments: http://www.inside-r.org/packages/cran/agridat/docs/agridat (R)
Climate data: http://www.cru.uea.ac.uk/cru/data/temperature/#datter
and ftp://ftp.cmdl.noaa.gov/
Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/
Geo Spatial Data: http://geodacenter.asu.edu/datalist/
Human Microbiome Project: http://www.hmpdacc.org/reference_genomes/reference_genomes.php
KDD Nugets Datasets: http://www.kdnuggets.com/datasets/index.html
MIT Cancer Genomics Data: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
NASA: http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html
NIH Microarray data: ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/ (R)
Protein structure: http://www.infobiotic.net/PSPbenchmarks/
Public Gene Data: http://www.pubgene.org/
Stanford Microarray Data: http://smd.stanford.edu//

Social Sciences
Analyze Survey Data for Free: http://www.asdfree.com/
General Social Survey: http://www3.norc.org/GSS+Website/
ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/index.jsp
UCLA Social Sciences Archive: http://dataarchives.ss.ucla.edu/Home.DataPortals.htm
UPJOHN INST: http://www.upjohn.org/erdc/erdc.html

Time Series
Time Series data Library: http://robjhyndman.com/TSDL/

Universities
Carnegie Mellon University Enron email: http://www.cs.cmu.edu/~enron/
Carnegie Mellon University StatLab: http://lib.stat.cmu.edu/datasets/
Carnegie Mellon University JASA data archive: http://lib.stat.cmu.edu/jasadata/
CMU Statlib: http://lib.stat.cmu.edu/datasets/
Ohio State University Financial data: http://fisher.osu.edu/fin/osudata.htm
Stanford Large Newtork Data: http://snap.stanford.edu/data/
UC Berkeley: http://ucdata.berkeley.edu/
UCI Machine Learning: http://archive.ics.uci.edu/ml/
UCLA: http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data
UC Riverside Time Series: http://www.cs.ucr.edu/~eamonn/time_series_data/
University of Toronto: http://www.cs.toronto.edu/~delve/data/datasets.html



Categories: Data Sets

Tags: , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: