Using NLP and Bayesian Logic to Automatically Classify Terrorism Events

NewImage.pngAnother question in “Terrorism and Terrorist Threat” course being offered by Dr. LaFree through Coursera and the University of Maryland (UMD), related to whether this scenario was terrorism or not:

Gunmen attacked a military convoy in Bazai town, Mohmand agency, Federally Administered Tribal Areas, Pakistan. Three soldiers were killed and two others were wounded in the attack. Tehrik-i-Taliban Pakistan (TTP) claimed responsibility for the incident.

Is this an incident of terrorism regardless of the fact that TTP targeted active combatants in an active military situation? Or should this be classified under insurgency? And with that said, is there a fundamental difference between terrorism and insurgency that requires a set of definitions to disentangle the two?

This appear to be terrorism. Starting with the definition of terrorism, “acts by non-state actors involving the threatened or actual use of illegal force or violence to obtain a political (other) goal through intimidation,” we can observe:

  • Non- state actor: Yes
  • Actual Use of Force: Yes
  • Way the Force Illegal: Yes, or at least assume so (i have not read their laws)
  • Designed to Obtain Political Change: Appears so
  • Was the force intimidating: Unknown

bayes-and-hus-theoryI hate to do this last part, but a data scientist is, well, a data scientist. Using inferential analysis (AKA Bayesian), one can ask the question,

 

P(terrorism | non-state actor, use of force, legality of force, nature of political change, degree of intimidation) = P(t |nsa, uof, lof, nopc, doi)

That is,  the probability of terrorism (P(terrorism ) given ( | ) all the following observations (non-state actor, etc.). This, in terns, leads to a bunch of bayesian math, which results inP(t | nsa, uof, lof, nopc, doi) being equal to:

P(t |nsa, uof, lof, nopc, doi) = P(nsa, uof, lof, nopc, doi | t) x P(t) / P(nsa, uof, lof, nopc, doi)

Let’s ignore the denominator for a moment, since it does not depend on whether an event is terrorism or not. So now we need to focus on calculating P(sa, uof, lof, nopc, toi | t) x P(t). This gets really complex very quickly,

P(nsa, uof, lof, nopc, toi, t) = P(nsa | uof, lof, nopc, doi, t) x P(uof | lof, nopc, doi, t) x P(lof | nopc, doi, t) x P(nopc | toi, t) x P(doi | t) x P(t)

So I am going to assume we do not care about the interactions between nsa, uof, lof, nopc, and doi. This is called a Naive Bayesian calculation.  This assumption, that one can not tell something about one factor given information about the other, is a big assumption, yet to be proved (which we could do by looking at data in GTD). However, it is a good place to start, in that it give insights without over stressing the calculus.  Applying Naive Bayes, we have a much simpler calculation:

P(nsa, uof, lof, nopc, toi, t) = P(nsa | t) x P(uof | t) x P(lof | t) x P(nopc | t) x P(doi | t) x P(t)

A quick side – P(nsa | t), P(uof | t), P(lof | t), P(nopc | t), and P(doi | t) are the “Likelihood” of x being seen given t. Where P(t) is the “Prior” probability of t. The cool part of this is that they are all computable from data in GTD.

Therefore,

P(t |nsa, uof, lof, nopc, doi) = P(nsa | t) x P(uof | t) x P(lof | t) x P(nopc | t) x P(doi | t) xP(t) / P(nsa, uof, lof, nopc, doi)

Since the denominator is independent of terrorism, let’s treat it as a constant (1/k). Thus, and the final thus,

P(t | nsa, uof, lof, nopc, doi) = P(nsa | t) x P(uof | t) x P(lof | t) x P(nopc | t) x P(doi | t) xP(t) / k

JointPMFFourSidedDiceNormally, this is where we can create interesting marginal tables that count the number of nsa that were terrorism and not, the number of lof associated with terrorism and not, etc., etc. Something like this, where a-j are replaced with actual numbers:

Factor
Terrorism
Not Terrorism
non-state actor present
a
b
use of force present
c
d
legality of force
e
f
political change desired (nature of political change)
g
h
degree of intimidation
i
j
Given that these marginal probabilities are high, at least in most observed GTD cases and significantly present this question, this could lead one to conclude that the this event should be defined as terrorism.

One last point, probably the more interesting one (hence the title) is that we can automate this process using natural language processing (NLP), semantic modeling, and a higher level of the naive bayesian math above.

Event Records -> [NLP] -> Semantic Model -> [Bayesian Calculator] ===> Terrorism or Not!

Fun stuff.



Categories: Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: