The Most Important Question for a Data Scientist starts with “HOW”

UnknownI am taking part in a “Terrorism and Terrorist Threat” course being offered by Dr. LaFree through Coursera and the University of Maryland (UMD). If you don’t know, Dr. LaFree heads the Study Of Terrorism and Response to Terrorism (START) center at UMD. It is a great program, through which one can gain access to hundreds of thousands of terrorism related records covering events all over the world. This is an awesome resources for a data scientist.

main-qimg-f749682d70ca1557df5aef528b0aee0cAny ways, part of the course requires forum participation and one of those forums was on “Why Do We Study Terrorism – Surprise Myths.” Dr. LaFree covers nine particularly interesting myths often associated with terrorism, one being that most terrorist attacks use sophisticated weapons, when in fact they do not.

However, in doing so I made this observation:

Dr. LaFree stated that most terrorist attacks “rely on non-sophisticated, readily available” weapons. This sounds logical, but is misleading. Sophistication is a characteristic of availability and not a search attribute by the terrorist. The fact that most weapons used in terrorist attacks are not sophisticated does not mean they do not want sophisticated technicals, which is implied in the original statement.In fact, most terrorist attacks use “readily available weapons,” which just so happen to be relatively non-sophisticated by their nature. For example, nuclear weapons are not readily available and thus not used by terrorist. However, if a terrorist group could procure a nuclear weapon, does anybody think they would not use it because it was “sophisticated?”

One of the other class participants then noted that:

Dr. Smith, my reading of Dr. LaFree’s statement did not make the assumption that terrorist groups do not actively try to procure sophisticated weapons. They do all the time. However, partly because they are much more difficult to acquire, it is the case that terrorists are forced to rely on less-sophisticated weapons. His claim solely asserts that terrorist attacks occur predominantly with non-sophisticated, readily accessible weapons like automatic weapons, grenades, dynamite, etc.

imagesMy point is more from an application perspective. Yes, terrorist use non-sophisticated weapons, this is a the “descriptive analysis” of information. The problem is that these descriptive statements often end up in “prescriptive policies.” For example, the premise of a newly proposed policy would read something like, “because terrorist mostly used non-sophisticated weapons, we will only do (fill in the blank).”

2016-01-23_12-33-49Descriptive analyses are an important part of any pathway to identifying actionable insights. However, they are often used to support weak-prescriptive argument. We need to move beyond these kind of initial observation by forcing strong-prescriptive arguments, ones supported by causally-lined predictive analyses. Let use the non-sophisticated weapons example and the data science frameworkto address it in a bit more detail:

Description Analysis (What is happening) – At this level of the analysis taxonomy and historical data, we can make the statements:

  • Most terrorist are resource constrained
  • There are significant number of states that have changed regimes
  • Resourced contained people use capabilities are readily available and inexpensive
  • There are large amounts of in-expensive readily available weapons available.
  • The more sophisticated a weapon, the more damage and destruction is cause.
  • Most in-expensive and readily available weapons are un-sophisiticated, that is, they have a low Weapon  Sophistication/Price ratio.

Diagnostic Analysis (Why is it happening) – Using numerous diagnostic analysis techniques on our descriptive data set, one being differential diagnostics, we could make the statement:

  • Most weapons in the past that are readily available are un-sophisiticated and low priced, but have a significant change over time.
  • Because of regime changes at the state level, a significant number of more sophisticated weapons are now available on the open market and some are relatively inexpensive.
  • There appears to be a statistical significant larger increase in sophistication verses price; that is, the percentage change in sophistication is greater than price.
  • The Weapon Sophistication/Price ratio of readily available weapons appears to be increasing over time.

Predictive Analysis (When will it happen) – Based on diagnostics analysis and the application of predictive analytical tools, we could speculate the following:

  • Using regime change as just one of many independent variables and Weapon Sophistication/Price ratio as the dependent variable, one could predict a statistically significant increase in the Weapon Sophistication/Price ratio as a function of time (future)
  • This means that for a constant and known amount of terrorism resources there could be a increasing, some could argue exponentially increasing, level of predicted sophistication weapons availability. Which, seems like a bad thing (my opinion).

Prescriptive Analysis (How do we change something) – How questions are the most important questions of all, because, by their nature, result in change. In this case, we could address one kind of how question like – How do we limit the damage done by terrorism? Seems like an interesting and compelling question that most of us want to answer. So, using the framework, we would want to do what?

Syria terrorists receive 2nd batch of US anti-tank missiles
A militant operates a TOW anti-tank rocket launcher in Syria. (File photo)

While I will leave that up to anyone reading this, this is the kind of actionable insight results in applying a data-science driven frame. While this was mostly a made up example, there is ample research supporting the spirit of the statements and will a bit of time and energy, one could fully qualify this logic path in more significant detail.




Global Terrorism Database (GTDb)

NewImageI have built a prototype database of terrorists and their known associates, inferred associates, cutouts, and ghosts (AKA Global Terrorism Database). Using the Deep Web Intelligence Platform and many of the critical capabilities found in my enterprise data science framework, I have have managed to pull together an initial repository of bad guys and people associated with them that will globally scale. While having a composite database of known bad guys is important, what is really interesting is the list of previously unknown people that are associated with them – some of which I would have never guessed.

I want to know if this is important and why. If you have any thoughts, please let me know (



Deep Learning Intelligence Platform – Addressing the KYC AML Counter Terrorism Financing Challenge

NewImageTerrorism impacts our lives each and every day; whether directly through acts of violence by terrorists, reduced liberties from new anti-terrorism laws, or increased taxes to support counter terrorism activities. A vital component of terrorism is the means through which these activities are financed, through legal and illicit financial activities. Recognizing the necessity to limit these financial activities in order to reduce terrorism, many nation states have agreed to a framework of global regulations, some of which have been realized through regulatory programs such as the Bank Secrecy Act (BSA).

As part of the BSA (an other similar regulations), governed financial services institutions are required to determine if the financial transactions of a person or entity is related to financing terrorism. This is a specific report requirement found in Response 30, of Section 2, in the FinCEN Suspicious Activity Report (SAR). For every financial transaction moving through a given banking system, the institution need to determine if it is suspicious and, if so, is it part of a larger terrorist activity. In the event that it is, the financial services institution is required to immediately file a SAR and call FinCEN.

The process of determining if a financial transaction is terrorism related is not merely a compliance issue, but a national security imperative. No solution exist today that adequately addresses this requirement. As such, I was asked to speak on the issue as a data scientist practicing in the private intelligence community. These are some of the relevant points from that discussion.

2014 12 16 21 38 08

Determining if a transaction is terrorism related, requires more that analyzing the anomalous nature of the activity, but the correlation of seemingly unrelated signals (profiles, transactions, interactions, etc.) through behavioral analyses.  Data (enterprise, IT, open source) is the historical debris of human activity. While any single data record is associated with one person, two physical independent events can be found through the causal behavioral analysis of data chains.  

2014 12 16 20 26 13Know Your Customer (KYC) is a common means through which one can learn about structures and behaviors of each individual in a community (e.g., commercial banking, insurance, etc.). It is the governing program through which customer due diligence is performed as part of compliance activities associated with on boarding and on going monitoring activities. 

2014 12 16 20 12 24

Over the years, through ongoing regulatory additions and changes, KYC has grown in complexity and, as a result, has become a significant multifaceted challenge to institutional employees. In additional to knowing about customer,  there is now a need to know more about the customer’s customers (KYCC). There are significant deficiencies  associated with determining propensity (probably), intelligence, and monitoring activities; even though most organizations are adequately dealing with a few of the ingestion, processing, and reporting activities.

2014 12 16 20 13 04

There are six major components to an effective know your customer program. Terrorism Financing Monitoring is one of the least mature and the hardest technically to solve. Traditional approaches encode simple transactional behaviors found through manual investigations into rules engines and event monitoring systems, an approach that does not scale as fast as the terrorism financing activities they are designed to defeat. 

2014 12 16 20 13 43Money laundering (ML), as defined by the United Nations, is the process through which the proceeds of criminal activities are disguised to conceal their origins. Fundamentally, money laundering is about financial structure (where) and behavior (how). The Financial Action Task Force (FATF) has established international standard for ML monitoring and reporting.

2014 12 16 20 14 31

While the mean through which money is laundered is beyond the scope of this presentation, there are several concrete examples that have been discovered as part of an ongoing money laundering ontology. The High Invoicing Scheme is often used to launder licit funds through commercial business enterprises by exchanging low value goods for high value illicit funds.

2014 12 16 20 15 07

Terrorist Financing (TF) involves the solicitation, collection or provision of funds with the intention that they may be used to support terrorist acts or organizations. In addition to understanding the structure and behavior of financial sources, understanding their intended use is also necessarily. This “intent” is one of the characteristics that make identifying terrorism financing so difficult.

2014 12 16 20 15 58

Terrorism financing and money laundering are interrelated. In money laundering, funds are always illicit in their origin, where funds for terrorism financing can come from both legal and illicit sources. Because of the dual funding source and the intended use of the funds, it is extremely difficult to identify whether financial activities are related to terrorism financing.

2014 12 16 20 16 33Below is a set of real account, transactional, and international profiles. Are they normal? Are they an example of money laundering? What about terrorism financing? In additional to answering these questions, would traditional ML and TF monitoring systems identify each activity or tie them together? The answers are at the bottom of this article.

2014 12 16 20 17 14

A wide variety of Anti-Money Laundering products are available today. At a baseline level, AML systems automate mandatory legal and regulatory compliance requirements and support the necessary enhanced due diligence and Know Your Customer policies.

2014 12 16 23 03 48 

Use cases in Risk are centered around connecting all business and financial information systems to enable enterprise regulatory, monitoring, and reporting requirements in order to further better risk decision making. Identify fraudulent behavior before it happens, with proactive intelligence and investigation tools, that are all capable of operating across multiple channels and nations.

2014 12 16 23 05 19

Data and intelligence analysts, as well as KYC AML & TF specialists, face an exponentially increasing challenge to thoroughly identify new customers and monitor all customer behaviors on a ongoing basis.

2014 12 17 08 38 01What is the new TF intelligence paradigm given the global regulatory requirements, the maturation of terrorist, the complexity of financial services information technology systems, and the national security imperative to find, fix, finish (exploit, analyze, and disseminate) terrorism actions pre-boom? It starts with the recognition that tradition enterprise (ERP, CRM, etc.) and IT (transactional logs, click through, etc.) data sources are insufficient. Additional data deep web and open source data needs to integrated into the analyses as a means identify networked behaviors.

2014 12 16 20 20 11 

In addition to new data sources, man and machine need to be integrated into a deep learning enabled ecosystem. Modeling the behaviors of bad guys is often counter productive, given their speed of adaptation. A more viable approach leverages modeling good guys and removing them from the target population under investigation. Machines automate this process of removing good behaviors from the system through black list aggregation and human guided machine learning algorithms. Intelligence experts perform enhanced investigations through Human, Physical, and Cyber Intel programs. All of these activities are wrapped in deep learning machines that learn from those highly utilized behaviors, driving the search from new data source and intelligence procedures.

2014 12 16 20 20 37

The new enterprise solution delivers (outside the box) the identity of bad people and organizations, behavioral activities, FinCEN SAR filings, and xml integration into the banking enterprise. In order to achieve these outcomes, banking enterprise and IT data, 3rd party black lists, and deep web and open source data is consumed. Bank AML and TF experts work in conjunction with Data Science, Behavioral, and Intelligence teams. As part of an enterprise learning system, the intelligence results are feedback into the platform as a means through which knowledge is grown.

 2014 12 17 09 36 26

Ienterprise architecture language, capabilities are “the ability to perform or achieve certain actions or outcomes through a set of controllable and measurable faculties, features, functions, processes, or services.”(1) In essence, they describe the what of the activity, but not necessarily the how. For a data science-driven approach to deriving insights, these are the collective sets of abilities that find and manage data, transform data into features capable of be exploited through modeling, modeling the structural and dynamic characteristics of phenomena, visualizing the results, and learning from the complete round trip process. The end-to-end process can be sectioned into Data, Information, Knowledge, and Intelligence.

2014 12 16 20 21 57

Data science is much more than just a singular computational process. Today, it’s a noun that collectively encompasses the ability to derive actionable insights from disparate data through mathematical and statistical processes, scientifically orchestrated by data scientists and functional behavioral analysts, all being supported by technology capable of linearly scaling to meet the exponential growth of data. One such set of technologies can be found in the Enterprise Intelligence Hub (EIH), a composite of disparate information sources, harvesters, hadoop (HDFS and MapReduce), enterprise R statistical processing, metadata management (business and technical), enterprise integration, and insights visualization – all wrapped in a deep learning framework. However, while this technical stuff is cool, Enterprise Intelligence Capabilities (EIC) are an even more important characteristic that drives the successful realization of the enterprise solutions needed to address the emerging KYC ML and TF threats.

2014 12 16 20 22 31

Terrorism financing came into the limelight after the terrorist attacks in the United States on the 11 September 2001. Global anti-terrorism programs, now manifested themselves through nation state regulations such as the Bank Secrecy Act, can be more effective through the use of deep learning ecosystems that integrate both machine and man. This is one such platform capable of achieving this goal. 

Post – The financially related transactions above where those associated with the 9/11 terrorists in 2001.

Art of Resistance – The Social Network Anatomy of a Kinetic Activist Group

2014 02 18 08 51 17

As a data scientist that works in the intelligence community, we are often asked to help identify where intelligence gathering and analysis resources should be allocated. Governmental and non-governmental intelligence organizations are bounded by both limited operational funds, as well as time. As such, resource allocation planing becomes an extremely important operational activity for data science teams. But how does one actually go about this?

There are no perfect right ways of looking for the proverbial needle in the haystack – needle being the bad guy and haystack being the world. While it is sometimes better to be lucky than good, having a systematically organic approach to resource allocation enables teams to manage the process to some level of statistical quality control (see Deming).  One such way is through the use of Social Network Analyses (SNA).

Social networks encapsulate the human dynamics that are characteristically important to most intelligence activities. Each node in the network represents an entity (person, place, thing, etc.), which is governed by Psychological behavioral characteristics. As these entities interact with each other, the nodes become interconnected forming networks. In turn, these networks are governed by Sociological behavioral principles. Take together, the social networks enable the intelligence community to understand and exploit behavioral dynamics – both psychological and sociological characteristics.

As a side bar, intelligence analysis is not always about why someone or a group does something. It is often more important to understand why they are not doing things. For example, in intelligence we look extensively at why certain groups associate with each other. But it is equally important to also understand why one activist group does not associated with another. From a business prospective there is an equivalence in the sales process. Product managers often over strive to understand who is buying their products and services, but lacks an material understanding on why people don’t buy these same solutions.

In a recent project, we were tasked by a client to determine if Greenpeace was or could become a significant disruptive geopolitical force a critical operational initiative. As part of the initial scoping activity, we needed to understand where to allocate our limited resources (global intel experts, business intel experts, subject matter experts, and data scientists) in order to increase the likelihood of addressing the client’s needs. A high-level SNA not only identify where to focus our effort, but also identify a previously unknown activism actor as well.

The six (6) panel layout below show how we stepped through our discovery. In FaceBook, we leverage Netviss to make an initial collection of group-oriented relationships for the principle target (Greenpeace). The 585 nodes, interconnected by 1788 edges, was imported into Gephi as shown in panel 1. As we say… somewhere in that spaghetti is a potential bad guy, but where?

Gephi Panel 01


After identifying and importing the data, it is important to generate an initial structural view of the entities. Force Atlas 2 is an effective algorithms since studies have identify that organizational structure can be inferred from layout structure (panel 2). While this layout provides some transparency into the network, it still lacks any real clarity around behavioral importance.

To better understand what entities are more central than other, we leveraged the Between Centrality. This is a measure of a node’s centrality in a network, an underlying psychological characteristic. Betweenness centrality is a more useful measure (than just connectivity) in that bigger nodes are more central to behavioral dynamics. As seen in panel 2, serval nodes become central figures in the overall network.

Identifying community relationships is an important next step in helping understand sociological characteristics. Using Modularity as a measure to unfold community organizations (panel 4), we now begin to see a clearer picture of who is doing what with whom. What becomes really interesting at this stage is understanding some of the more nuance relationships.

Take for example the five outlying nodes in the network (blue, maroon, yellow, dark green, and light green). There appear to be central to an equally important red node in the center. Panel 5 clear shows this central relationship. Upon further examination (filtering out nodes with low value Betweenness Centrality metics), we see the emergence of a previously un-recognized activism player: Art of Resistance.

2014 02 18 08 34 16


While Greenpeace was the original target of interest, use of basic social network analysis principles resulted in the discover of an emergent activism group playing a central role in the coordination and communication of events.  Further analysis of this group revealed their propensity to promote kinetic activities (physical violence, bombing, etc.) over more traditional passive non-kinect events found in Greenpeace.

Gephi Panel 02

A resource allocation plan was then developed to monitor and harvest open source information around key players of each community (larger nodes). The plan resulted in a more focused intelligence analysis process where human analysts could explore in-depth the behavioral dynamics of critical entities, rather that tangentially digesting summary information from all.

Social network analysis (SNA) is an effective tool for the intelligence team, as well as the data science. Finding the proverbial needle in the haystack requires a systematically organic process that explains both the why and why not of behavioral dynamics. Use of these kinds of tools enable a broad set of capabilities, ranging from resource allocation to discovery.