Six Types Of Analyses Every Data Scientist Should Know

 NewImageJeffrey Leek, Assistant Professor of Biostatistics at John Hopkins Bloomberg School of Public Health, has identified six(6) archetypical analyses. As presented, they range from the least to most complex, in terms of knowledge, costs, and time. In summary,

  • Descriptive
  • Exploratory
  • Inferential
  • Predictive
  • Causal
  • Mechanistic

1. Descriptive (least amount of effort):  The discipline of quantitatively describing the main features of a collection of data. In essence, it describes a set of data.

– Typically the first kind of data analysis performed on a data set

– Commonly applied to large volumes of data, such as census data

-The description and interpretation processes are different steps

– Univariate and Bivariate are two types of statistical descriptive analyses.

Type of data set applied to: Census Data Set – a whole population

 Example: Census DataNewImage

2. Exploratory: An approach to analyzing data sets to find previously unknown relationships.

– Exploratory models are good for discovering new connections

– They are also useful for defining future studies/questions

– Exploratory analyses are usually not the definitive answer to the question at hand, but only the start

– Exploratory analyses alone should not be used for generalizing and/or predicting

– Remember: correlation does not imply causation

Type of data set applied to: Census and Convenience Sample Data Set (typically non-uniform) – a random sample with many variables measured

Example: Microarray Data Analysis NewImage

3. Inferential: Aims to test theories about the nature of the world in general (or some part of it) based on samples of “subjects” taken from the world (or some part of it). That is, use a relatively small sample of data to say something about a bigger population.

– Inference is commonly the goal of statistical models

– Inference involves estimating both the quantity you care about and your uncertainty about your estimate

– Inference depends heavily on both the population and the sampling scheme

Type of data set applied to: Observational, Cross Sectional Time Study, and Retrospective Data Set – the right, randomly sampled population

Example: Inferential Analysis NewImage

4. Predictive: The various types of methods that analyze current and historical facts to make predictions about future events. In essence, to use the data on some objects to predict values for another object.

– The models predicts, but it does not mean that the independent variables cause

– Accurate prediction depends heavily on measuring the right variables

– Although there are better and worse prediction models, more data and a simple model works really well

– Prediction is very hard, especially about the future references

Type of data set applied to: Prediction Study Data Set – a training and test data set from the same population

Example: Predictive Analysis



Another Example of Predictive Analysis


5. Causal: To find out what happens to one variable when you change another.

– Implementation usually requires randomized studies

– There are approaches to inferring causation in non-randomized studies

– Causal models are said to be the “gold standard” for data analysis

Type of data set applied to: Randomized Trial Data Set – data from a randomized study

Example: Causal Analysis


6. Mechanistic (most amount of effort): Understand the exact changes in variables that lead to changes in other variables for individual objects.

– Incredibly hard to infer, except in simple situations

– Usually modeled by a deterministic set of equations (physical/engineering science)

– Generally the random component of the data is measurement error

– If the equations are known but the parameters are not, they may be inferred with data analysis

Type of data set applied to: Randomized Trial Data Set – data about all components of the system

Example: Mechanistic Analysis


Categories: Tools

Tags: , ,

8 replies

  1. This is an excellent summary.

  2. Very resourceful ! Thanks

  3. [citation needed]. Where does Jeffrey talk about this? Can you link to original? Thanks

  4. @Carl: he mentions it in his courses @ John Hopkins.
    I first found it while taking JHU’s Data Science specialization on Coursera. The classification was mentioned in the first Course (but I don’t remember exactly the Lecture’s name)

  5. I’m seeking to model analysis for educational purposes and I find your model too complicated. That said, I would like to converse with you to discuss some simpler categories and verbage if you are open to that.

  6. Reblogged this on lifecrazyscience and commented:
    one must need to know this if you want to deal with number and people aiming growth of buisness

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: