Coopetition – A Comparison of Data Analytics & Data Sciences

NewImageThere is a lot of discussion around how data sciences and data analytics differ, from the tools that are used to the methodologies that are employed. Two useful perspectives are to look at the differences (what separates them) and then looking at the commonalities (what brings them together). The “tail of the tapes” (below), provides nine common measures used to differentiate these two “data fighters.” The most notable for this discussion is the first – Philosophy. Data analytics tends to focus it’s mental energy on confirming (quantifying and qualifying) things we know we want to know. On the other hand, data sciences is about revelation – the discovery of something new in a previously unknown area.

2015 06 19 08 11 41

Another lens through which we can look at the differences question is that of the Knowledge Model (used above). This model divides our understanding (or not) of the world around us into four groups: you know what we know, you know what you don’t know, you don’t know what you know, and you don’t know what you don’t know. Simple examples of the first two are: you know your age, but don’t know mine. The third is a bit trickier in that this is about recall and recognition. A possible example is recall an event earlier in your life when you smell a particular scent or hear a specific song (ah, those where the days). There are an infinite number of examples in the last category, but one I use a lot is that you probably don’t know much about pebble nuclear reactors and you did not know you didn’t know it until you read those words. On with data sciences. By the way, hardly ever spend time looking into thing we don’t know we know (recall), since a lot of assessments-oriented event highlight them during discovery, which results in more knowing what you know.

The Knowledge Model is very useful when thinking through data analytics and data sciences. Data analytics is fundamentally about providing clarity around those things we know we know. For example, what is my product inventory throughout my global supply chain. Data sciences, on the other hand, explores those things what we don’t know we don’t know, with the goal of producing actionable insights. An example is finding undiscovered ways of limiting product leakage throughout a global supply chain. In the middle, the connective layer, is where data analytics and data science often come together. For example, trying to better understanding why there are different levels of inventory throughout our supply chain or discovering events that will impact them.

2015 06 19 10 13 23

While there are differences and commonalities between data analytics and data science, they are both equally important. Without analytics, we would not be able to operate our factories or even pay our employees. Data Analytics powers the economic engine of society. On the hand, without data science we would be suck doing the same thing over and over, our businesses would be incapable of real strategic growth. Data Sciences is a catalyst that move our society through stagnation. Both very different, but both interconnect. A perfect example of Coopetition.

SaveSave

FIELD NOTE: Answering One of the Most Asked Question of Data Scientists

NewImageGregory Piatetsky and Shashank Lyer have begun to answer one of the most asked question facing data science practitioners: Which tools work with which other tools? If I had a dollar for every time this question was asked of me, well, let’s just say I’d already be retired! So, when I saw Gregory and Shashank took a crack at this question, I was intrigued.

In their recent article Which Big Data, Data Mining, and Data Science Tools go together?, both authors use a version of Apriori algorithm to analyze the results of a 2105 KDnuggets Data Mining Software Poll. This work is an excellent example of how simple techniques, added together, can result in very useful insights.

For example, the graph below visualizes the correlation between the top 20 most popular tools. For the nodes, Red: Free/Open Source tools, Green: Commercial tools, Fuchsia: Hadoop/Big Data tools. The node sizes  vary based on the percentage of votes each tool received. The segmentation shows the weights of each edge, the thicker ones showing a high association and the latter a low association.

NewImage

For those, like me, who predominately work in the R world, here are a list of tools that are most often associated with R

2015 06 17 11 06 57

While this work is just the start and only covers a limited user population, the authors provide the data set for those that want to further explore the survey or continue collecting additional tools data in order to extend their insights.

Darkness, A Flashlight, and the Data Scientist

What you don t knowData sciences and data analytics not only use different techniques, that are often highly dependent on the distribution characteristics of the data, but also produce very different categorical types of insights. These insights range from a better understanding things you know you know (data analytics) to discoveries in area where you don’t know what you don’t know (data sciences). However, this knowledge metaphor can be a bit confusing, so I often use the “Darkness, A Flashlight, and the Data Scientist” parable.

Flash Light

In your mind, picture a darkened room, where you are standing, but do not know where in the room you are. In your hand is large flashlight. You raise it slowly, pointing it in a direction. You turn it on and white light radiates forward.

The light of the flashlight shines brightly on a distant wall, where you see several items. These are the things you know that you know. As you your eyes begin to scan outward, the wall turns to deep dark dark black where the light does not reach. In this darkness, there are things you don’t know you don’t know. You begin to look back into the cent of the light – that grey transitionary boundary between the light of what we know and the darkness of the we don’t know, are all the things we know we don’t know.

Singularity4

Data analytics is lot about understanding those things we know we know, that is quantifying the light. This is the world of descriptive and diagnostic analytics. On the other hand, data sciences help use understand the darkest parts of our world, where we look to predict temporal and spatial relationships  and prescribe means for achieving desired outcomes. Data analytics and sciences are different in their own ways, each very important in their own right.

However, in the case of the data scientist, the metaphorical role is to pull the flashlight back so that more areas of the wall are illuminated. So, as the flashlight is linearly pulled back, the data scientist enables an exponential increase in our knowledge. In essence, the data scientist works in the dark so that others can benefit from the light. Think about it!