2014 02 18 08 51 17

As a data scientist that works in the intelligence community, we are often asked to help identify where intelligence gathering and analysis resources should be allocated. Governmental and non-governmental intelligence organizations are bounded by both limited operational funds, as well as time. As such, resource allocation planing becomes an extremely important operational activity for data science teams. But how does one actually go about this?

There are no perfect right ways of looking for the proverbial needle in the haystack – needle being the bad guy and haystack being the world. While it is sometimes better to be lucky than good, having a systematically organic approach to resource allocation enables teams to manage the process to some level of statistical quality control (see Deming).  One such way is through the use of Social Network Analyses (SNA).

Social networks encapsulate the human dynamics that are characteristically important to most intelligence activities. Each node in the network represents an entity (person, place, thing, etc.), which is governed by Psychological behavioral characteristics. As these entities interact with each other, the nodes become interconnected forming networks. In turn, these networks are governed by Sociological behavioral principles. Take together, the social networks enable the intelligence community to understand and exploit behavioral dynamics – both psychological and sociological characteristics.

As a side bar, intelligence analysis is not always about why someone or a group does something. It is often more important to understand why they are not doing things. For example, in intelligence we look extensively at why certain groups associate with each other. But it is equally important to also understand why one activist group does not associated with another. From a business prospective there is an equivalence in the sales process. Product managers often over strive to understand who is buying their products and services, but lacks an material understanding on why people don’t buy these same solutions.

In a recent project, we were tasked by a client to determine if Greenpeace was or could become a significant disruptive geopolitical force a critical operational initiative. As part of the initial scoping activity, we needed to understand where to allocate our limited resources (global intel experts, business intel experts, subject matter experts, and data scientists) in order to increase the likelihood of addressing the client’s needs. A high-level SNA not only identify where to focus our effort, but also identify a previously unknown activism actor as well.

The six (6) panel layout below show how we stepped through our discovery. In FaceBook, we leverage Netviss to make an initial collection of group-oriented relationships for the principle target (Greenpeace). The 585 nodes, interconnected by 1788 edges, was imported into Gephi as shown in panel 1. As we say… somewhere in that spaghetti is a potential bad guy, but where?

Gephi Panel 01


After identifying and importing the data, it is important to generate an initial structural view of the entities. Force Atlas 2 is an effective algorithms since studies have identify that organizational structure can be inferred from layout structure (panel 2). While this layout provides some transparency into the network, it still lacks any real clarity around behavioral importance.

To better understand what entities are more central than other, we leveraged the Between Centrality. This is a measure of a node’s centrality in a network, an underlying psychological characteristic. Betweenness centrality is a more useful measure (than just connectivity) in that bigger nodes are more central to behavioral dynamics. As seen in panel 2, serval nodes become central figures in the overall network.

Identifying community relationships is an important next step in helping understand sociological characteristics. Using Modularity as a measure to unfold community organizations (panel 4), we now begin to see a clearer picture of who is doing what with whom. What becomes really interesting at this stage is understanding some of the more nuance relationships.

Take for example the five outlying nodes in the network (blue, maroon, yellow, dark green, and light green). There appear to be central to an equally important red node in the center. Panel 5 clear shows this central relationship. Upon further examination (filtering out nodes with low value Betweenness Centrality metics), we see the emergence of a previously un-recognized activism player: Art of Resistance.

2014 02 18 08 34 16


While Greenpeace was the original target of interest, use of basic social network analysis principles resulted in the discover of an emergent activism group playing a central role in the coordination and communication of events.  Further analysis of this group revealed their propensity to promote kinetic activities (physical violence, bombing, etc.) over more traditional passive non-kinect events found in Greenpeace.

Gephi Panel 02

A resource allocation plan was then developed to monitor and harvest open source information around key players of each community (larger nodes). The plan resulted in a more focused intelligence analysis process where human analysts could explore in-depth the behavioral dynamics of critical entities, rather that tangentially digesting summary information from all.

Social network analysis (SNA) is an effective tool for the intelligence team, as well as the data science. Finding the proverbial needle in the haystack requires a systematically organic process that explains both the why and why not of behavioral dynamics. Use of these kinds of tools enable a broad set of capabilities, ranging from resource allocation to discovery.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.