Data science is much more than just a singular computational process. Today, it’s a noun that collectively encompasses the ability to derive actionable insights from disparate data through mathematical and statistical processes, scientifically orchestrated by data scientists and functional behavioral analysts, all being supported by technology capable of linearly scaling to meet the exponential growth of data. One such set of technologies can be found in the Enterprise Intelligence Hub (EIH), a composite of disparate information sources, harvesters, hadoop (HDFS and MapReduce), enterprise R statistical processing, metadata management (business and technical), enterprise integration, and insights visualization – all wrapped in a deep learning framework. However, while this technical stuff is cool, Enterprise Intelligence Capabilities (EIC) are an even more important characteristic that drives the successful realization of the enterprise solution.
In enterprise architecture language, capabilities are “the ability to perform or achieve certain actions or outcomes through a set of controllable and measurable faculties, features, functions, processes, or services.”(1) In essence, they describe the what of the activity, but not necessarily the how. For a data science-driven approach to deriving insights, these are the collective sets of abilities that find and manage data, transform data into features capable of be exploited through modeling, modeling the structural and dynamic characteristics of phenomena, visualizing the results, and learning from the complete round trip process. The end-to-end process can be sectioned into Data, Information, Knowledge, and Intelligence.
Each of these atomic capabilities can be used by four different key resources to produce concrete intermediate and final intelligence products. The Platform Engineer (PE) is responsible for harvesting and maintenance of raw data, ensuring well formed metadata. For example, they would write Python scripts used by Flume to ingest Reddit dialogue into the Hadoop ecosystem. The MapReduce Engineer (MR) produces features based on imported data sets. One common function is extracting topics through MapReduced programmed natural language processing on document sets. The Data Science (DS) performs statistical analyses and develops machine learning algorithms. Time series analysis, for example, is often used by the data scientist as a basis of identifying anomalies in data sets. Taken all together, Enterprise Intelligence Capabilities can transform generic text sources (observations) into actionable intelligence through the intermediate production of metadata tagged signals and contextualized events.
Regardless of how data science is being used to derive insights, at the desktop or throughout the enterprise, capabilities become the building block for effective solution development. Independent of actual implementation (e.g., there are many different ways to perform anomaly detection), they are the scalable building blocks that transform raw data into the intelligence needed to realize true actionable insights.