However, today artificial intelligence if often overly complicated by characterizing it in terms its underlying capabilities and technologies. Capability like machine learning, natural language processing, and robotic process automation are frequency points of discussion with consumers. When talking about AI, practitioners often invoke describe it in terms of genetic algorithms, neural networks, and evolutionary programming. While these capabilities and technologies accurately reflect the inner complexity of what makes artificial intelligence naturally hard, one still needs to bring AI to life in a way that simplifies our daily lives.
We are in the midst of a intelligence revolution that, by its definition, is destine to change our lives. Like the farmer being replace by the factor work being replace by the service worker, our lives will become more meaningful only when AI is as prolific as air. So, we need to bring AI to life by hiding the complexity that makes it hard, while transparently illuminating all the ways our lives become more simplified because it. It is only then when we will evolve to our next logical level of enlightenment.
Gunmen attacked a military convoy in Bazai town, Mohmand agency, Federally Administered Tribal Areas, Pakistan. Three soldiers were killed and two others were wounded in the attack. TehrikiTaliban Pakistan (TTP) claimed responsibility for the incident.
Is this an incident of terrorism regardless of the fact that TTP targeted active combatants in an active military situation? Or should this be classified under insurgency? And with that said, is there a fundamental difference between terrorism and insurgency that requires a set of definitions to disentangle the two?
This appear to be terrorism. Starting with the definition of terrorism, “acts by nonstate actors involving the threatened or actual use of illegal force or violence to obtain a political (other) goal through intimidation,” we can observe:
I hate to do this last part, but a data scientist is, well, a data scientist. Using inferential analysis (AKA Bayesian), one can ask the question,
P(terrorism  nonstate actor, use of force, legality of force, nature of political change, degree of intimidation) = P(t nsa, uof, lof, nopc, doi)
That is, the probability of terrorism (P(terrorism ) given (  ) all the following observations (nonstate actor, etc.). This, in terns, leads to a bunch of bayesian math, which results inP(t  nsa, uof, lof, nopc, doi) being equal to:
P(t nsa, uof, lof, nopc, doi) = P(nsa, uof, lof, nopc, doi  t) x P(t) / P(nsa, uof, lof, nopc, doi)
Let’s ignore the denominator for a moment, since it does not depend on whether an event is terrorism or not. So now we need to focus on calculating P(sa, uof, lof, nopc, toi  t) x P(t). This gets really complex very quickly,
P(nsa, uof, lof, nopc, toi, t) = P(nsa  uof, lof, nopc, doi, t) x P(uof  lof, nopc, doi, t) x P(lof  nopc, doi, t) x P(nopc  toi, t) x P(doi  t) x P(t)
So I am going to assume we do not care about the interactions between nsa, uof, lof, nopc, and doi. This is called a Naive Bayesian calculation. This assumption, that one can not tell something about one factor given information about the other, is a big assumption, yet to be proved (which we could do by looking at data in GTD). However, it is a good place to start, in that it give insights without over stressing the calculus. Applying Naive Bayes, we have a much simpler calculation:
P(nsa, uof, lof, nopc, toi, t) = P(nsa  t) x P(uof  t) x P(lof  t) x P(nopc  t) x P(doi  t) x P(t)
A quick side – P(nsa  t), P(uof  t), P(lof  t), P(nopc  t), and P(doi  t) are the “Likelihood” of x being seen given t. Where P(t) is the “Prior” probability of t. The cool part of this is that they are all computable from data in GTD.
Therefore,
P(t nsa, uof, lof, nopc, doi) = P(nsa  t) x P(uof  t) x P(lof  t) x P(nopc  t) x P(doi  t) xP(t) / P(nsa, uof, lof, nopc, doi)
Since the denominator is independent of terrorism, let’s treat it as a constant (1/k). Thus, and the final thus,
P(t  nsa, uof, lof, nopc, doi) = P(nsa  t) x P(uof  t) x P(lof  t) x P(nopc  t) x P(doi  t) xP(t) / k
Normally, this is where we can create interesting marginal tables that count the number of nsa that were terrorism and not, the number of lof associated with terrorism and not, etc., etc. Something like this, where aj are replaced with actual numbers:
Factor

Terrorism

Not Terrorism

nonstate actor present

a

b

use of force present

c

d

legality of force

e

f

political change desired (nature of political change)

g

h

degree of intimidation

i

j

One last point, probably the more interesting one (hence the title) is that we can automate this process using natural language processing (NLP), semantic modeling, and a higher level of the naive bayesian math above.
Event Records > [NLP] > Semantic Model > [Bayesian Calculator] ===> Terrorism or Not!
Fun stuff.
Any ways, part of the course requires forum participation and one of those forums was on “Why Do We Study Terrorism – Surprise Myths.” Dr. LaFree covers nine particularly interesting myths often associated with terrorism, one being that most terrorist attacks use sophisticated weapons, when in fact they do not.
However, in doing so I made this observation:
Dr. LaFree stated that most terrorist attacks “rely on nonsophisticated, readily available” weapons. This sounds logical, but is misleading. Sophistication is a characteristic of availability and not a search attribute by the terrorist. The fact that most weapons used in terrorist attacks are not sophisticated does not mean they do not want sophisticated technicals, which is implied in the original statement.In fact, most terrorist attacks use “readily available weapons,” which just so happen to be relatively nonsophisticated by their nature. For example, nuclear weapons are not readily available and thus not used by terrorist. However, if a terrorist group could procure a nuclear weapon, does anybody think they would not use it because it was “sophisticated?”
One of the other class participants then noted that:
Dr. Smith, my reading of Dr. LaFree’s statement did not make the assumption that terrorist groups do not actively try to procure sophisticated weapons. They do all the time. However, partly because they are much more difficult to acquire, it is the case that terrorists are forced to rely on lesssophisticated weapons. His claim solely asserts that terrorist attacks occur predominantly with nonsophisticated, readily accessible weapons like automatic weapons, grenades, dynamite, etc.
My point is more from an application perspective. Yes, terrorist use nonsophisticated weapons, this is a the “descriptive analysis” of information. The problem is that these descriptive statements often end up in “prescriptive policies.” For example, the premise of a newly proposed policy would read something like, “because terrorist mostly used nonsophisticated weapons, we will only do (fill in the blank).”
Descriptive analyses are an important part of any pathway to identifying actionable insights. However, they are often used to support weakprescriptive argument. We need to move beyond these kind of initial observation by forcing strongprescriptive arguments, ones supported by causallylined predictive analyses. Let use the nonsophisticated weapons example and the data science frameworkto address it in a bit more detail:
Description Analysis (What is happening) – At this level of the analysis taxonomy and historical data, we can make the statements:
Diagnostic Analysis (Why is it happening) – Using numerous diagnostic analysis techniques on our descriptive data set, one being differential diagnostics, we could make the statement:
Predictive Analysis (When will it happen) – Based on diagnostics analysis and the application of predictive analytical tools, we could speculate the following:
Prescriptive Analysis (How do we change something) – How questions are the most important questions of all, because, by their nature, result in change. In this case, we could address one kind of how question like – How do we limit the damage done by terrorism? Seems like an interesting and compelling question that most of us want to answer. So, using the framework, we would want to do what?
While I will leave that up to anyone reading this, this is the kind of actionable insight results in applying a datascience driven frame. While this was mostly a made up example, there is ample research supporting the spirit of the statements and will a bit of time and energy, one could fully qualify this logic path in more significant detail.
While many have tried, it is impractical define the definitive list of R resources, given all the great blogs, texts, and videos available. Most attempt to create such a list are failures from the start. So, in many cases, one needs just to Google the phrase “R Resources” in order to find 80% of the good ones, while exerting less than 20% of your overall research effort.
For my list, here are the texts and PDFs that I keep near or with me most of the time:
General introductions to R
1. An introduction to R. Venables and Smith (2009) – PDF
2. A beginner’s guide to R (Use R!). Zuur et al. (2009) – Text
3. R for Dummies. Meys and de Vries (2012) – Text
4. The R book. Crawley (2012)
5. R in a nutshell: A desktop quick reference. Adler (2012)
Statistics books
1. Statistics for Dummies. Gotelli and Ellison (2012) – Text
2. Statistical methods. Snedecor and Cochran (2014) – Text
3. Introduction to Statistics: Fundamental Concepts and Procedures of Data Analysis. Reid (2013) – Text
Statistics books specifically using R
1. Introductory statistics: a conceptual approach using R. Ware et al. (2012) – Text
2. Foundations and applications of statistics: an introduction using R. Pruim (2011) – Text
3. Probability and statistics with R, 2nd Edition. Ugarte et al. (2008) – Text
Visualization using R
1. ggplot2: elegant graphics for data analysis. Wickham (2009)
2. R graphics cookbook. Chang (2013)
Programming using R
1. The art of R programming. Matloff (2011)
2. Mastering Data Analysis with R. Daroczi (2015)
Interesting predictive analytics books
1. The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t. Silver (2012)
2. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die. Siegel (2013)
How many new tires can be sold in the Philadelphia area just prior to its first snow storm? How many people will die from the next pandemic that infects North America? What is the global revenue protential for a new medical app on the iPad Pro that helps first time parents with their new born child? There are relatively simple questions that data scientists are often asked to address.
As simple as they might seem, the real world is fraught with networks of complexity, while at the same time, data scientist are often accused of overthinking solutions as they try to make sense of it. Even the simplest of explorations, like determining the number of tires sold, can take on unbounded fidelity without proper problem scoping. In turn, this can result in both the exponential growth of data as well as the uncertainty in our confidence of observing that data.
It is important for the analyst to grossly understand, to estimate, the solution without spending time and money on detailed analyses, supported by countless models. One such type of estimation is call a Fermi Problem, which is a framework designed to teach dimensional analysis and can be thought of as “backoftheenvelope calculations.” Fermi problems are often used in engineering and sciences scope the larger problem before attempting to build complex models that address more precise answers.
Michael Mitchell does an excellent job at TED Ed talking about Fermi approaches when dealing with complex problems:
Interesting. Yes?
Moving on…while Fermi estimation has no formal calculus, with the help of Sherman Kent’s (CIA Analyst) perspective on information, one can break down the approach the following equation:
Fermi Estimation = things we know for certain (facts) + things we should know, but don’t (assumptions, which have ranges) + things we don’t know we don’t know (error term)
The first term is as close as one can come to a statement of indisputable fact. It describes something knowable and known with a high degree of certainty.
The second term is a judgment or estimate. It describes something which is knowable in terms of the human understanding but not precisely known by the man who is talking about it.
The third term is another judgment or estimate, this one made almost without any evidence direct or indirect. It may be an estimate of something that no man alive can know or will ever know. As such, it truly represents that ultimate error in our knowledge.
The Fermi estimation approach, as you can see, provides an answer before turning to more sophisticated modeling methods and a useful check on their results. As long as the assumptions in the estimate are reasonable, the Fermi estimation gives a quick and simple way to obtain a “frame of reference” for what might be a reasonable expectation of the final answer.
1. The system must know itself in terms of what resources it has access to, what its capabilities and limitations are and how and why it is connected to other systems.
2. The system must be able to automatically configure and reconfigure itself depending on the changing computing environment.
3. The system must be able to optimize its performance to ensure the most efficient computing process.
4. The system must be able to work around encountered problems by either repairing itself or routing functions away from the trouble.
5. The system must detect, identify and protect itself against various types of attacks to maintain overall system security and integrity.
6. The system must be able to adapt to its environment as it changes, interacting with neighboring systems and establishing communication protocols.
7. The system must rely on open standards and cannot exist in a proprietary environment.
8. The system must anticipate the demand on its resources while keeping transparent to users.
I agree with the premise; that is, a disproportionate amount of time is spent in data prepping (80%), but the author does not provide any insights into how to reduce it (the flip from 80% to 20%). Study after study has show this to be the case, so it is worthless to argue a statistical point. But towards the end of the article, he states that, “Flipping the rule will mean more datadriven decisions.” Ok, I get it. But please explain how?
Well, the cheap “naive” way would be to just start spending more time with the analytics process itself. That is, once the prep process is complete, just spend 16x more effort with analytics (do the math). This would give you the 20% prep and 80% analytics the author wants to achieve. Cheep trick, but that is statistics. But even that is not the issue. The real issue isn’t moving from 80% to 20%.
The real is challenge is understanding exactly what “value” means in the data science process and understanding a systematic way to achieve it. In the end, if I have to spend 80% of time preparing and 20% analyzing in order to discover “how” to grown a business in a profitable way, who cares what the ratio is. Real value comes for focusing on the questions; from what (descriptive), to why (diagnostic), to when (predictive), and finally how (prescriptive). In doing so, a chain is created with each stage linking value (AKA a value chain). Ok, but how do you do this?
Addressing that question (my reveal) is beyond the scope of this article. I would suggest one start by looking at a few article in Data Scientist Insights blog. There are several articles that deal exactly on this point. After that, write me (@InsightDataSci) and we can talk.
Take, for example, the use of Emergent Phenomenon, which is used to achieve highly complex, often cognitive behavior. Emergence is a process through which large complex patterns of behavior (cognitive by nature) can be achieved through the interaction of simpler processes (activities), each of which do not exhibit complex behavior. Think of 10,000 ants swarming to regulate a hive to within 1 degree celsius. Ants have no formal communication, no sight, no central command and control, just pheromones as a mean of marking their trail. Their ability to regulate birthing hives is a type of emergent phenomenon that was not programmed, is just emerges.
By the way, think about how one would design, implement, and test an emergent system. These systems “behave” a lot like offspring in the developmental stages of life (infants, children, teenagers, young adults, etc.). We do not “design” them, but yet through repeated complex interactions with their environment (people, places, and things), they grown to achieve amazing capabilities.
As such, true cognitive computing is more Emergent that Designed. However, today’s solutions tends to simulate cognitive behavior (fake the behavior without understanding their structure) rather than emulating its capabilities (takes on structural similarities that result in comparable characteristics). A very simplistic example is traditional genetic algorithms, originally developed by Holland. His approach uses a series of zeros and ones to encode information, from which evolutionary principles (selection, crossover, mutation, etc.) are applied. Over a series of evolutions, populations of these strings can exhibit complex behaviors.
This, however, is not the way nature works. Instead genetic evolution is encoded in nucleotides (C, T, A G) through which more complex expressions of value can be made. While a subtle difference, it is that difference between simulation and emulation which holds back true evolutionarybased cognitive progress.
Until such time that our computational approach to solve changes, that is more emulation (emergent) than simulation (design), cognitive computing will mostly be the things through which marketing manipulates the buying masses and not the means through which our siliconbased helper nurtures its human masters.
Another lens through which we can look at the differences question is that of the Knowledge Model (used above). This model divides our understanding (or not) of the world around us into four groups: you know what we know, you know what you don’t know, you don’t know what you know, and you don’t know what you don’t know. Simple examples of the first two are: you know your age, but don’t know mine. The third is a bit trickier in that this is about recall and recognition. A possible example is recall an event earlier in your life when you smell a particular scent or hear a specific song (ah, those where the days). There are an infinite number of examples in the last category, but one I use a lot is that you probably don’t know much about pebble nuclear reactors and you did not know you didn’t know it until you read those words. On with data sciences. By the way, hardly ever spend time looking into thing we don’t know we know (recall), since a lot of assessmentsoriented event highlight them during discovery, which results in more knowing what you know.
The Knowledge Model is very useful when thinking through data analytics and data sciences. Data analytics is fundamentally about providing clarity around those things we know we know. For example, what is my product inventory throughout my global supply chain. Data sciences, on the other hand, explores those things what we don’t know we don’t know, with the goal of producing actionable insights. An example is finding undiscovered ways of limiting product leakage throughout a global supply chain. In the middle, the connective layer, is where data analytics and data science often come together. For example, trying to better understanding why there are different levels of inventory throughout our supply chain or discovering events that will impact them.
While there are differences and commonalities between data analytics and data science, they are both equally important. Without analytics, we would not be able to operate our factories or even pay our employees. Data Analytics powers the economic engine of society. On the hand, without data science we would be suck doing the same thing over and over, our businesses would be incapable of real strategic growth. Data Sciences is a catalyst that move our society through stagnation. Both very different, but both interconnect. A perfect example of Coopetition.