John Tukey (1915-2000) was an American mathematician and has been called the father of modern exploratory data analysis and data visualization. Tukey has written a lot on these subject, so I thought I’d share three of my favorite and also more popular quotes:
The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
To statisticians, hubris should mean the kind of pride that fosters an inflated idea of one’s powers and thereby keeps one from being more than marginally helpful to others. … The feeling of “Give me (or more likely even, give my assistant) the data, and I will tell you what the real answer is!” is one we must all fight against again and again, and yet again.
Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.
So, if you like these quotes and are looking for a great data science read, then check out Tukey’s text, Exploratory Data Analysis.
I thought I’d share this email observation I made earlier today about how Big Data seems to be quadrificating into these orthogonal fields:
1. Data (the intrinsic 1/0 property of big data) which can be broken down subjective areas like interaction data, transaction data || structured, unstructured || realtime/streaming, batch/static || etc.
2. MapReduce platforms – AKA divide and conquer – virtual integration capabilities that enable aggregation and management of multiple name-spaced data sources (Hadoop, InfoSphere Streams, Pneuron, etc.)
3. Data Exploration, Data Mining, and Intelligence Platforms – technical capabilities that enable one to derive insights from data (Pentaho, IBM InfoSphere, ListenLogic, MatLab, Mathematics, Statistica, etc.).
4. Knowledge Worker platform (AKA The human component) – The two most important capabilities come from data scientists (navigate through data) and behavioral scientists (navigate through human behavior, which most important things seem to connect back to).
In essence, Big Data has data, an ability to find it and use it, and an ability to explore and learn from it.
Does this seem right? Missing anything? Please post or email me.