Over the last several months I have been involved with developing uniques data science capabilities for the intelligence community, ones specifically based on exploiting insights derived from the open source intelligence (OSINT) found in the deep web. The deep web is World Wide Web (WWW) content that is not part of the Surface Web, which is indexed by standard search engines. It is usually inaccessible through traditional search engines because of the dynamic characteristics of the content and in persistent natural of its URLs. Spanning over 7,500 terabytes of data, it is the richest source of raw material that can be used to build out value.
One of the more important aspects of intelligence is being able to connect multiple seemingly unrelated events together during a time frame amenable for making actionable decisions. This capability is the optimal blend of man and machine, enabling customers to know more and know sooner. It is only in these low signal that are found in the deep web that one can use behavioral sciences (psychology and sociology) to extract outcome-oriented value.
Data in the web is mostly composed of noise, which can be unique but is often of low value. Unfortunately, the index engines of the world (Google, Bing, Yahoo) add marginal value to very few data streams that are important to any valuation process. Real value comes from correlating event networks (people performing actions) through deep web signal, which are not the purview of traditional search engines.
These deep web intelligence capabilities can be achieved in part through the use of machine learning enabled, data science driven, and hadoop-oriented enterprise information hubs. The platform support the 5 plus essential capabilities for actionable intelligence operations:
1. Scalable Infrastructure – Industry standard hardware supported through cloud-based infrastructure providers that is scales linearly with analytical demands.
2. Hadoop – Allows for computation to occur next to data storage and enables storage schema on read – stores data in native raw format.
3. Enterprise Data Science – Scalable exploratory methods, predictive algorithms, and prescriptive and machine learning.
4. Elastic Data Collection – In addition to pulling data from third party sources through APIs, bespoke data collection through scraping web services enables data analyses not capable within traditional enterprise analytics groups.
5. Temporal/Geospatial/Contextual Analyst – The ability to regionalize events, to a specific context, during a specified time (past, present, future).
6. Visualization – Effective visualization that tailors actionable results to individual needs.
The Plus – data, Data, DATA. Without data, lots of disparate data, data science platforms are of no value.
Today’s executive, inundated with TOO MUCH DATA, has limited ability to synthesize trends and actionable insights driving competitive advantage. Traditional research tools, internet and social harvesters do not correlate or predict trends. They look at hindsight or, at best, exist at the surface of things. A newer approach based on combining the behavioral analyses achievable through people and the machine learning found in scalable computational system can bridge this capability gap.