FIELD NOTE: Answering One of the Most Asked Question of Data Scientists

NewImageGregory Piatetsky and Shashank Lyer have begun to answer one of the most asked question facing data science practitioners: Which tools work with which other tools? If I had a dollar for every time this question was asked of me, well, let’s just say I’d already be retired! So, when I saw Gregory and Shashank took a crack at this question, I was intrigued. 

In their recent article Which Big Data, Data Mining, and Data Science Tools go together?, both authors use a version of Apriori algorithm to analyze the results of a 2105 KDnuggets Data Mining Software Poll. This work is an excellent example of how simple techniques, added together, can result in very useful insights.

For example, the graph below visualizes the correlation between the top 20 most popular tools. For the nodes, Red: Free/Open Source tools, Green: Commercial tools, Fuchsia: Hadoop/Big Data tools. The node sizes  vary based on the percentage of votes each tool received. The segmentation shows the weights of each edge, the thicker ones showing a high association and the latter a low association.

NewImage

For those, like me, who predominately work in the R world, here are a list of tools that are most often associated with R

2015 06 17 11 06 57

While this work is just the start and only covers a limited user population, the authors provide the data set for those that want to further explore the survey or continue collecting additional tools data in order to extend their insights.



Categories: Data Science, Field Note, Tools

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: