This is the first in a series of screencasts designed to demonstrate practical aspects of data science. In this episode, I will show you how to integrate R, that awesome awe inspiring statistical processing environment, with Hadoop, the master of distributed data storage and processing. Once done, we are going to then apply the RHadoop environment to count the number of words in that massive classical book “Moby Dick.”
In this screencast, we are going to setup a Hadoop environment on a Mac OS X operating system; download, install, and configure hadoop; download and install R and R Studio; download and load RHadoop packages; configure R; and finally, create and execute a test mapreduce problem. Here, let me show you exactly how all this works.
The scripts to this screencast will be posted over the next couple of days.