Using NLP and Bayesian Logic to Automatically Classify Terrorism Events

NewImage.pngAnother question in “Terrorism and Terrorist Threat” course being offered by Dr. LaFree through Coursera and the University of Maryland (UMD), related to whether this scenario was terrorism or not:

Gunmen attacked a military convoy in Bazai town, Mohmand agency, Federally Administered Tribal Areas, Pakistan. Three soldiers were killed and two others were wounded in the attack. Tehrik-i-Taliban Pakistan (TTP) claimed responsibility for the incident.

Is this an incident of terrorism regardless of the fact that TTP targeted active combatants in an active military situation? Or should this be classified under insurgency? And with that said, is there a fundamental difference between terrorism and insurgency that requires a set of definitions to disentangle the two?

This appear to be terrorism. Starting with the definition of terrorism, “acts by non-state actors involving the threatened or actual use of illegal force or violence to obtain a political (other) goal through intimidation,” we can observe:

  • Non- state actor: Yes
  • Actual Use of Force: Yes
  • Way the Force Illegal: Yes, or at least assume so (i have not read their laws)
  • Designed to Obtain Political Change: Appears so
  • Was the force intimidating: Unknown

bayes-and-hus-theoryI hate to do this last part, but a data scientist is, well, a data scientist. Using inferential analysis (AKA Bayesian), one can ask the question,


P(terrorism | non-state actor, use of force, legality of force, nature of political change, degree of intimidation) = P(t |nsa, uof, lof, nopc, doi)

That is,  the probability of terrorism (P(terrorism ) given ( | ) all the following observations (non-state actor, etc.). This, in terns, leads to a bunch of bayesian math, which results inP(t | nsa, uof, lof, nopc, doi) being equal to:

P(t |nsa, uof, lof, nopc, doi) = P(nsa, uof, lof, nopc, doi | t) x P(t) / P(nsa, uof, lof, nopc, doi)

Let’s ignore the denominator for a moment, since it does not depend on whether an event is terrorism or not. So now we need to focus on calculating P(sa, uof, lof, nopc, toi | t) x P(t). This gets really complex very quickly,

P(nsa, uof, lof, nopc, toi, t) = P(nsa | uof, lof, nopc, doi, t) x P(uof | lof, nopc, doi, t) x P(lof | nopc, doi, t) x P(nopc | toi, t) x P(doi | t) x P(t)

So I am going to assume we do not care about the interactions between nsa, uof, lof, nopc, and doi. This is called a Naive Bayesian calculation.  This assumption, that one can not tell something about one factor given information about the other, is a big assumption, yet to be proved (which we could do by looking at data in GTD). However, it is a good place to start, in that it give insights without over stressing the calculus.  Applying Naive Bayes, we have a much simpler calculation:

P(nsa, uof, lof, nopc, toi, t) = P(nsa | t) x P(uof | t) x P(lof | t) x P(nopc | t) x P(doi | t) x P(t)

A quick side – P(nsa | t), P(uof | t), P(lof | t), P(nopc | t), and P(doi | t) are the “Likelihood” of x being seen given t. Where P(t) is the “Prior” probability of t. The cool part of this is that they are all computable from data in GTD.


P(t |nsa, uof, lof, nopc, doi) = P(nsa | t) x P(uof | t) x P(lof | t) x P(nopc | t) x P(doi | t) xP(t) / P(nsa, uof, lof, nopc, doi)

Since the denominator is independent of terrorism, let’s treat it as a constant (1/k). Thus, and the final thus,

P(t | nsa, uof, lof, nopc, doi) = P(nsa | t) x P(uof | t) x P(lof | t) x P(nopc | t) x P(doi | t) xP(t) / k

JointPMFFourSidedDiceNormally, this is where we can create interesting marginal tables that count the number of nsa that were terrorism and not, the number of lof associated with terrorism and not, etc., etc. Something like this, where a-j are replaced with actual numbers:

Not Terrorism
non-state actor present
use of force present
legality of force
political change desired (nature of political change)
degree of intimidation
Given that these marginal probabilities are high, at least in most observed GTD cases and significantly present this question, this could lead one to conclude that the this event should be defined as terrorism.

One last point, probably the more interesting one (hence the title) is that we can automate this process using natural language processing (NLP), semantic modeling, and a higher level of the naive bayesian math above.

Event Records -> [NLP] -> Semantic Model -> [Bayesian Calculator] ===> Terrorism or Not!

Fun stuff.

The Most Important Question for a Data Scientist starts with “HOW”

UnknownI am taking part in a “Terrorism and Terrorist Threat” course being offered by Dr. LaFree through Coursera and the University of Maryland (UMD). If you don’t know, Dr. LaFree heads the Study Of Terrorism and Response to Terrorism (START) center at UMD. It is a great program, through which one can gain access to hundreds of thousands of terrorism related records covering events all over the world. This is an awesome resources for a data scientist.

main-qimg-f749682d70ca1557df5aef528b0aee0cAny ways, part of the course requires forum participation and one of those forums was on “Why Do We Study Terrorism – Surprise Myths.” Dr. LaFree covers nine particularly interesting myths often associated with terrorism, one being that most terrorist attacks use sophisticated weapons, when in fact they do not.

However, in doing so I made this observation:

Dr. LaFree stated that most terrorist attacks “rely on non-sophisticated, readily available” weapons. This sounds logical, but is misleading. Sophistication is a characteristic of availability and not a search attribute by the terrorist. The fact that most weapons used in terrorist attacks are not sophisticated does not mean they do not want sophisticated technicals, which is implied in the original statement.In fact, most terrorist attacks use “readily available weapons,” which just so happen to be relatively non-sophisticated by their nature. For example, nuclear weapons are not readily available and thus not used by terrorist. However, if a terrorist group could procure a nuclear weapon, does anybody think they would not use it because it was “sophisticated?”

One of the other class participants then noted that:

Dr. Smith, my reading of Dr. LaFree’s statement did not make the assumption that terrorist groups do not actively try to procure sophisticated weapons. They do all the time. However, partly because they are much more difficult to acquire, it is the case that terrorists are forced to rely on less-sophisticated weapons. His claim solely asserts that terrorist attacks occur predominantly with non-sophisticated, readily accessible weapons like automatic weapons, grenades, dynamite, etc.

imagesMy point is more from an application perspective. Yes, terrorist use non-sophisticated weapons, this is a the “descriptive analysis” of information. The problem is that these descriptive statements often end up in “prescriptive policies.” For example, the premise of a newly proposed policy would read something like, “because terrorist mostly used non-sophisticated weapons, we will only do (fill in the blank).”

2016-01-23_12-33-49Descriptive analyses are an important part of any pathway to identifying actionable insights. However, they are often used to support weak-prescriptive argument. We need to move beyond these kind of initial observation by forcing strong-prescriptive arguments, ones supported by causally-lined predictive analyses. Let use the non-sophisticated weapons example and the data science frameworkto address it in a bit more detail:

Description Analysis (What is happening) – At this level of the analysis taxonomy and historical data, we can make the statements:

  • Most terrorist are resource constrained
  • There are significant number of states that have changed regimes
  • Resourced contained people use capabilities are readily available and inexpensive
  • There are large amounts of in-expensive readily available weapons available.
  • The more sophisticated a weapon, the more damage and destruction is cause.
  • Most in-expensive and readily available weapons are un-sophisiticated, that is, they have a low Weapon  Sophistication/Price ratio.

Diagnostic Analysis (Why is it happening) – Using numerous diagnostic analysis techniques on our descriptive data set, one being differential diagnostics, we could make the statement:

  • Most weapons in the past that are readily available are un-sophisiticated and low priced, but have a significant change over time.
  • Because of regime changes at the state level, a significant number of more sophisticated weapons are now available on the open market and some are relatively inexpensive.
  • There appears to be a statistical significant larger increase in sophistication verses price; that is, the percentage change in sophistication is greater than price.
  • The Weapon Sophistication/Price ratio of readily available weapons appears to be increasing over time.

Predictive Analysis (When will it happen) – Based on diagnostics analysis and the application of predictive analytical tools, we could speculate the following:

  • Using regime change as just one of many independent variables and Weapon Sophistication/Price ratio as the dependent variable, one could predict a statistically significant increase in the Weapon Sophistication/Price ratio as a function of time (future)
  • This means that for a constant and known amount of terrorism resources there could be a increasing, some could argue exponentially increasing, level of predicted sophistication weapons availability. Which, seems like a bad thing (my opinion).

Prescriptive Analysis (How do we change something) – How questions are the most important questions of all, because, by their nature, result in change. In this case, we could address one kind of how question like – How do we limit the damage done by terrorism? Seems like an interesting and compelling question that most of us want to answer. So, using the framework, we would want to do what?

Syria terrorists receive 2nd batch of US anti-tank missiles
A militant operates a TOW anti-tank rocket launcher in Syria. (File photo)

While I will leave that up to anyone reading this, this is the kind of actionable insight results in applying a data-science driven frame. While this was mostly a made up example, there is ample research supporting the spirit of the statements and will a bit of time and energy, one could fully qualify this logic path in more significant detail.