Open Source Intelligence (OSINT) can use sophisticated Sociological and Psychological (Socio-Psycho) factors in order to positively identify Key Opinion Leaders (KOL). This animated presentation demonstrates just how it can be applied to one of the most opaque places on earth: The White House.
I am taking part in a “Terrorism and Terrorist Threat” course being offered by Dr. LaFree through Coursera and the University of Maryland (UMD). If you don’t know, Dr. LaFree heads the Study Of Terrorism and Response to Terrorism (START) center at UMD. It is a great program, through which one can gain access to hundreds of thousands of terrorism related records covering events all over the world. This is an awesome resources for a data scientist.
Any ways, part of the course requires forum participation and one of those forums was on “Why Do We Study Terrorism – Surprise Myths.” Dr. LaFree covers nine particularly interesting myths often associated with terrorism, one being that most terrorist attacks use sophisticated weapons, when in fact they do not.
However, in doing so I made this observation:
Dr. LaFree stated that most terrorist attacks “rely on non-sophisticated, readily available” weapons. This sounds logical, but is misleading. Sophistication is a characteristic of availability and not a search attribute by the terrorist. The fact that most weapons used in terrorist attacks are not sophisticated does not mean they do not want sophisticated technicals, which is implied in the original statement.In fact, most terrorist attacks use “readily available weapons,” which just so happen to be relatively non-sophisticated by their nature. For example, nuclear weapons are not readily available and thus not used by terrorist. However, if a terrorist group could procure a nuclear weapon, does anybody think they would not use it because it was “sophisticated?”
One of the other class participants then noted that:
Dr. Smith, my reading of Dr. LaFree’s statement did not make the assumption that terrorist groups do not actively try to procure sophisticated weapons. They do all the time. However, partly because they are much more difficult to acquire, it is the case that terrorists are forced to rely on less-sophisticated weapons. His claim solely asserts that terrorist attacks occur predominantly with non-sophisticated, readily accessible weapons like automatic weapons, grenades, dynamite, etc.
My point is more from an application perspective. Yes, terrorist use non-sophisticated weapons, this is a the “descriptive analysis” of information. The problem is that these descriptive statements often end up in “prescriptive policies.” For example, the premise of a newly proposed policy would read something like, “because terrorist mostly used non-sophisticated weapons, we will only do (fill in the blank).”
Descriptive analyses are an important part of any pathway to identifying actionable insights. However, they are often used to support weak-prescriptive argument. We need to move beyond these kind of initial observation by forcing strong-prescriptive arguments, ones supported by causally-lined predictive analyses. Let use the non-sophisticated weapons example and the data science frameworkto address it in a bit more detail:
Description Analysis (What is happening) – At this level of the analysis taxonomy and historical data, we can make the statements:
- Most terrorist are resource constrained
- There are significant number of states that have changed regimes
- Resourced contained people use capabilities are readily available and inexpensive
- There are large amounts of in-expensive readily available weapons available.
- The more sophisticated a weapon, the more damage and destruction is cause.
- Most in-expensive and readily available weapons are un-sophisiticated, that is, they have a low Weapon Sophistication/Price ratio.
Diagnostic Analysis (Why is it happening) – Using numerous diagnostic analysis techniques on our descriptive data set, one being differential diagnostics, we could make the statement:
- Most weapons in the past that are readily available are un-sophisiticated and low priced, but have a significant change over time.
- Because of regime changes at the state level, a significant number of more sophisticated weapons are now available on the open market and some are relatively inexpensive.
- There appears to be a statistical significant larger increase in sophistication verses price; that is, the percentage change in sophistication is greater than price.
- The Weapon Sophistication/Price ratio of readily available weapons appears to be increasing over time.
Predictive Analysis (When will it happen) – Based on diagnostics analysis and the application of predictive analytical tools, we could speculate the following:
- Using regime change as just one of many independent variables and Weapon Sophistication/Price ratio as the dependent variable, one could predict a statistically significant increase in the Weapon Sophistication/Price ratio as a function of time (future)
- This means that for a constant and known amount of terrorism resources there could be a increasing, some could argue exponentially increasing, level of predicted sophistication weapons availability. Which, seems like a bad thing (my opinion).
Prescriptive Analysis (How do we change something) – How questions are the most important questions of all, because, by their nature, result in change. In this case, we could address one kind of how question like – How do we limit the damage done by terrorism? Seems like an interesting and compelling question that most of us want to answer. So, using the framework, we would want to do what?
While I will leave that up to anyone reading this, this is the kind of actionable insight results in applying a data-science driven frame. While this was mostly a made up example, there is ample research supporting the spirit of the statements and will a bit of time and energy, one could fully qualify this logic path in more significant detail.
I have built a prototype database of terrorists and their known associates, inferred associates, cutouts, and ghosts (AKA Global Terrorism Database). Using the Deep Web Intelligence Platform and many of the critical capabilities found in my enterprise data science framework, I have have managed to pull together an initial repository of bad guys and people associated with them that will globally scale. While having a composite database of known bad guys is important, what is really interesting is the list of previously unknown people that are associated with them – some of which I would have never guessed.
I want to know if this is important and why. If you have any thoughts, please let me know (firstname.lastname@example.org).
Terrorism impacts our lives each and every day; whether directly through acts of violence by terrorists, reduced liberties from new anti-terrorism laws, or increased taxes to support counter terrorism activities. A vital component of terrorism is the means through which these activities are financed, through legal and illicit financial activities. Recognizing the necessity to limit these financial activities in order to reduce terrorism, many nation states have agreed to a framework of global regulations, some of which have been realized through regulatory programs such as the Bank Secrecy Act (BSA).
As part of the BSA (an other similar regulations), governed financial services institutions are required to determine if the financial transactions of a person or entity is related to financing terrorism. This is a specific report requirement found in Response 30, of Section 2, in the FinCEN Suspicious Activity Report (SAR). For every financial transaction moving through a given banking system, the institution need to determine if it is suspicious and, if so, is it part of a larger terrorist activity. In the event that it is, the financial services institution is required to immediately file a SAR and call FinCEN.
The process of determining if a financial transaction is terrorism related is not merely a compliance issue, but a national security imperative. No solution exist today that adequately addresses this requirement. As such, I was asked to speak on the issue as a data scientist practicing in the private intelligence community. These are some of the relevant points from that discussion.
Determining if a transaction is terrorism related, requires more that analyzing the anomalous nature of the activity, but the correlation of seemingly unrelated signals (profiles, transactions, interactions, etc.) through behavioral analyses. Data (enterprise, IT, open source) is the historical debris of human activity. While any single data record is associated with one person, two physical independent events can be found through the causal behavioral analysis of data chains.
Know Your Customer (KYC) is a common means through which one can learn about structures and behaviors of each individual in a community (e.g., commercial banking, insurance, etc.). It is the governing program through which customer due diligence is performed as part of compliance activities associated with on boarding and on going monitoring activities.
Over the years, through ongoing regulatory additions and changes, KYC has grown in complexity and, as a result, has become a significant multifaceted challenge to institutional employees. In additional to knowing about customer, there is now a need to know more about the customer’s customers (KYCC). There are significant deficiencies associated with determining propensity (probably), intelligence, and monitoring activities; even though most organizations are adequately dealing with a few of the ingestion, processing, and reporting activities.
There are six major components to an effective know your customer program. Terrorism Financing Monitoring is one of the least mature and the hardest technically to solve. Traditional approaches encode simple transactional behaviors found through manual investigations into rules engines and event monitoring systems, an approach that does not scale as fast as the terrorism financing activities they are designed to defeat.
Money laundering (ML), as defined by the United Nations, is the process through which the proceeds of criminal activities are disguised to conceal their origins. Fundamentally, money laundering is about financial structure (where) and behavior (how). The Financial Action Task Force (FATF) has established international standard for ML monitoring and reporting.
While the mean through which money is laundered is beyond the scope of this presentation, there are several concrete examples that have been discovered as part of an ongoing money laundering ontology. The High Invoicing Scheme is often used to launder licit funds through commercial business enterprises by exchanging low value goods for high value illicit funds.
Terrorist Financing (TF) involves the solicitation, collection or provision of funds with the intention that they may be used to support terrorist acts or organizations. In addition to understanding the structure and behavior of financial sources, understanding their intended use is also necessarily. This “intent” is one of the characteristics that make identifying terrorism financing so difficult.
Terrorism financing and money laundering are interrelated. In money laundering, funds are always illicit in their origin, where funds for terrorism financing can come from both legal and illicit sources. Because of the dual funding source and the intended use of the funds, it is extremely difficult to identify whether financial activities are related to terrorism financing.
Below is a set of real account, transactional, and international profiles. Are they normal? Are they an example of money laundering? What about terrorism financing? In additional to answering these questions, would traditional ML and TF monitoring systems identify each activity or tie them together? The answers are at the bottom of this article.
A wide variety of Anti-Money Laundering products are available today. At a baseline level, AML systems automate mandatory legal and regulatory compliance requirements and support the necessary enhanced due diligence and Know Your Customer policies.
Use cases in Risk are centered around connecting all business and financial information systems to enable enterprise regulatory, monitoring, and reporting requirements in order to further better risk decision making. Identify fraudulent behavior before it happens, with proactive intelligence and investigation tools, that are all capable of operating across multiple channels and nations.
Data and intelligence analysts, as well as KYC AML & TF specialists, face an exponentially increasing challenge to thoroughly identify new customers and monitor all customer behaviors on a ongoing basis.
What is the new TF intelligence paradigm given the global regulatory requirements, the maturation of terrorist, the complexity of financial services information technology systems, and the national security imperative to find, fix, finish (exploit, analyze, and disseminate) terrorism actions pre-boom? It starts with the recognition that tradition enterprise (ERP, CRM, etc.) and IT (transactional logs, click through, etc.) data sources are insufficient. Additional data deep web and open source data needs to integrated into the analyses as a means identify networked behaviors.
In addition to new data sources, man and machine need to be integrated into a deep learning enabled ecosystem. Modeling the behaviors of bad guys is often counter productive, given their speed of adaptation. A more viable approach leverages modeling good guys and removing them from the target population under investigation. Machines automate this process of removing good behaviors from the system through black list aggregation and human guided machine learning algorithms. Intelligence experts perform enhanced investigations through Human, Physical, and Cyber Intel programs. All of these activities are wrapped in deep learning machines that learn from those highly utilized behaviors, driving the search from new data source and intelligence procedures.
The new enterprise solution delivers (outside the box) the identity of bad people and organizations, behavioral activities, FinCEN SAR filings, and xml integration into the banking enterprise. In order to achieve these outcomes, banking enterprise and IT data, 3rd party black lists, and deep web and open source data is consumed. Bank AML and TF experts work in conjunction with Data Science, Behavioral, and Intelligence teams. As part of an enterprise learning system, the intelligence results are feedback into the platform as a means through which knowledge is grown.
In enterprise architecture language, capabilities are “the ability to perform or achieve certain actions or outcomes through a set of controllable and measurable faculties, features, functions, processes, or services.”(1) In essence, they describe the what of the activity, but not necessarily the how. For a data science-driven approach to deriving insights, these are the collective sets of abilities that find and manage data, transform data into features capable of be exploited through modeling, modeling the structural and dynamic characteristics of phenomena, visualizing the results, and learning from the complete round trip process. The end-to-end process can be sectioned into Data, Information, Knowledge, and Intelligence.
Data science is much more than just a singular computational process. Today, it’s a noun that collectively encompasses the ability to derive actionable insights from disparate data through mathematical and statistical processes, scientifically orchestrated by data scientists and functional behavioral analysts, all being supported by technology capable of linearly scaling to meet the exponential growth of data. One such set of technologies can be found in the Enterprise Intelligence Hub (EIH), a composite of disparate information sources, harvesters, hadoop (HDFS and MapReduce), enterprise R statistical processing, metadata management (business and technical), enterprise integration, and insights visualization – all wrapped in a deep learning framework. However, while this technical stuff is cool, Enterprise Intelligence Capabilities (EIC) are an even more important characteristic that drives the successful realization of the enterprise solutions needed to address the emerging KYC ML and TF threats.
Terrorism financing came into the limelight after the terrorist attacks in the United States on the 11 September 2001. Global anti-terrorism programs, now manifested themselves through nation state regulations such as the Bank Secrecy Act, can be more effective through the use of deep learning ecosystems that integrate both machine and man. This is one such platform capable of achieving this goal.
Post – The financially related transactions above where those associated with the 9/11 terrorists in 2001.
As a data scientist that works in the intelligence community, we are often asked to help identify where intelligence gathering and analysis resources should be allocated. Governmental and non-governmental intelligence organizations are bounded by both limited operational funds, as well as time. As such, resource allocation planing becomes an extremely important operational activity for data science teams. But how does one actually go about this?
There are no perfect right ways of looking for the proverbial needle in the haystack – needle being the bad guy and haystack being the world. While it is sometimes better to be lucky than good, having a systematically organic approach to resource allocation enables teams to manage the process to some level of statistical quality control (see Deming). One such way is through the use of Social Network Analyses (SNA).
Social networks encapsulate the human dynamics that are characteristically important to most intelligence activities. Each node in the network represents an entity (person, place, thing, etc.), which is governed by Psychological behavioral characteristics. As these entities interact with each other, the nodes become interconnected forming networks. In turn, these networks are governed by Sociological behavioral principles. Take together, the social networks enable the intelligence community to understand and exploit behavioral dynamics – both psychological and sociological characteristics.
As a side bar, intelligence analysis is not always about why someone or a group does something. It is often more important to understand why they are not doing things. For example, in intelligence we look extensively at why certain groups associate with each other. But it is equally important to also understand why one activist group does not associated with another. From a business prospective there is an equivalence in the sales process. Product managers often over strive to understand who is buying their products and services, but lacks an material understanding on why people don’t buy these same solutions.
In a recent project, we were tasked by a client to determine if Greenpeace was or could become a significant disruptive geopolitical force a critical operational initiative. As part of the initial scoping activity, we needed to understand where to allocate our limited resources (global intel experts, business intel experts, subject matter experts, and data scientists) in order to increase the likelihood of addressing the client’s needs. A high-level SNA not only identify where to focus our effort, but also identify a previously unknown activism actor as well.
The six (6) panel layout below show how we stepped through our discovery. In FaceBook, we leverage Netviss to make an initial collection of group-oriented relationships for the principle target (Greenpeace). The 585 nodes, interconnected by 1788 edges, was imported into Gephi as shown in panel 1. As we say… somewhere in that spaghetti is a potential bad guy, but where?
After identifying and importing the data, it is important to generate an initial structural view of the entities. Force Atlas 2 is an effective algorithms since studies have identify that organizational structure can be inferred from layout structure (panel 2). While this layout provides some transparency into the network, it still lacks any real clarity around behavioral importance.
To better understand what entities are more central than other, we leveraged the Between Centrality. This is a measure of a node’s centrality in a network, an underlying psychological characteristic. Betweenness centrality is a more useful measure (than just connectivity) in that bigger nodes are more central to behavioral dynamics. As seen in panel 2, serval nodes become central figures in the overall network.
Identifying community relationships is an important next step in helping understand sociological characteristics. Using Modularity as a measure to unfold community organizations (panel 4), we now begin to see a clearer picture of who is doing what with whom. What becomes really interesting at this stage is understanding some of the more nuance relationships.
Take for example the five outlying nodes in the network (blue, maroon, yellow, dark green, and light green). There appear to be central to an equally important red node in the center. Panel 5 clear shows this central relationship. Upon further examination (filtering out nodes with low value Betweenness Centrality metics), we see the emergence of a previously un-recognized activism player: Art of Resistance.
While Greenpeace was the original target of interest, use of basic social network analysis principles resulted in the discover of an emergent activism group playing a central role in the coordination and communication of events. Further analysis of this group revealed their propensity to promote kinetic activities (physical violence, bombing, etc.) over more traditional passive non-kinect events found in Greenpeace.
A resource allocation plan was then developed to monitor and harvest open source information around key players of each community (larger nodes). The plan resulted in a more focused intelligence analysis process where human analysts could explore in-depth the behavioral dynamics of critical entities, rather that tangentially digesting summary information from all.
Social network analysis (SNA) is an effective tool for the intelligence team, as well as the data science. Finding the proverbial needle in the haystack requires a systematically organic process that explains both the why and why not of behavioral dynamics. Use of these kinds of tools enable a broad set of capabilities, ranging from resource allocation to discovery.