UPDATE: U.S. On The Brink: Near-Depression Levels Losses In Wealth Expected


U.S. employers’ labor cost sustained its five year high into the third quarter of 2014. Economist believe this is being driven by a tightening labor market, which often results in company pressure to raise wages and salaries. According to the Bureau of Labor Statistics, wage and salaries, which make up about 70% of compensation costs, rose 0.7% over the last two quarters.

2014 11 01 13 13 32

In the original “U.S. On The Brink: Near-Depression Levels Losses In Wealth Expected” article, the expected median wealth loss was projected to be 18% to 27% over the next 2 to 5 years, respectively. This was driven by a decline in the Wealth to Income index and lower than expected rise in Median Income. Give this sustained change in wages and salaries, the following revised losses in Wealth are based on projected mean Median US Incomes (upward revision):

2014 11 01 12 09 33

The revised analysis now shows a median wealth loss of 15% to 23% over the next 2 to 5 years, respectively. This means that for a family who has a median net wealth of $182K (Federal Reserve, 2013), they are likely to see it fall to $154K by 2016 and $140K by 2019.



U.S. On The Brink: Near-Depression Levels Losses In Wealth Expected

NewImageThe U.S. is on the brink of witnessing some of the largest economic losses in net wealth since the Great Depression. The US Wealth To Income index (reported in Credit Suisse Global Wealth Report 2014) has exceed its mean 3rd quartile for only the forth time in history (see below). While the significance of this most recent event can not be overstated, one can determine the actual economic impact likely to be seen with a bit of time series and probabilistic modeling. 

2014 10 26 15 56 46

In order to quantify the impact on US wealth, we need to forecast the future US Wealth to Income index, along with the expect Median Income for the same period of time. Let’s start by looking at a few of the more interesting characteristics of Wealth to Income index. A stationarity analysis (Augmented Dickey Fuller test) of the index data indicates that we can not reject the null hypothesis that is non-stationary (Dickey-Fuller = -2.3486, Lag order = 0, p-value = 0.4319), which means we can use Autoregressive Integrated Moving Average (ARIMA) time series modeling to forecast future events.

ARIMA are the most general class of models for forecasting a time series which can be made to be “stationary” by differencing (if necessary), perhaps in conjunction with nonlinear transformations such as logging or deflating (if necessary). An ARIMA model is classified as an “ARIMA(p,d,q)” model, where: 

  • p is the number of autoregressive terms, 
  • d is the number of nonseasonal differences, and 
  • q is the number of lagged forecast errors in the prediction equation.

Through experimental evaluation, the most appropriate ARIMA model is ARIMA (1,1,2), which is forecasted for 10 years and added to the original data series in order to produce the graph below. Here we see the fitted mean, forecasted mean, upper and lower 95% confidence interval, as well as the historical Wealth to Income data.  

2014 10 27 09 35 06

At first glance, one expects an equal likelihood of realizing either the forecasted upper or lower values. However, history can provide event-oriented insights that will allow a more probabilistic approach to determining the most likely forecast. Given a certain threshold value of the Wealth to Income index, we can count that number of years it takes for the index to return to pre-threshold level, once exceed. For example, if we set a Wealth to Income index threshold of 5.5, the mean number years spent above this threshold is 4.6 yrs, with a standard deviation (sd) of 2.198 and standard error (se) of 0.98. In addition, the upper and lower 95% confidence levels are 6.52 and 2.68 yrs, respectively. Here is a complete table of years spent above aWealth to Income threshold value:

2014 10 26 17 55 36

With this new threshold data, one can see that the Wealth to Income index stays above the 6.0 level for only 1.08 to 4.42 yrs. Given that this phase is 2 yrs into the cycle, it is more likely that the Wealth to Income index will see a decline in the next 2 years. Thus, we can reject the upper bounds of the forecast model and accept the lower bounds (forecasted lower 95%) for modeling purposes.

A similar analysis, to the one above, was used to forecast the median US Income (see below). In this case, the ARIMA(2,1,0) model was experimentally found to best represent this time series. The median US income is projected to have low to moderate growth over the next ten years and does not have significant volatility, as seen in the Wealth to Income index. Given some of the downward economic and regulatory pressures, the lower bounds (forecasted lower 95%) of forecast will be used in the analysis.

2014 10 27 09 45 09

The last step in the analysis to compute the cumulative percentage change (cumPercentWealthDiff) in wealth as a function of a forecasted Wealth to Income index and US Median Income. The table below show the results of multiplying the respective values and differencing them over the periods in question.

2014 11 01 12 04 11

The analysis shows a median wealth loss of 18% to 27% over the next 2 to 5 years, respectively. This means that for a family who has a median net wealth of $182K (Federal Reserve, 2013), they are likely to see it fall to $150K by 2016 and $133K by 2019. In comparison to 2007-2010 recession, the Federal Reserve said the median net worth of families plunged by 39 percent in just three years, from $126,400 in 2007 to $77,300 in 2010. This analysis appears to be consistent with the reality seen over the last few years.

NewImageThe cause and effect relationship of this correlative model remains unclear. So, while some can probably find faults with this analysis (e.g., assume the Wealth to Income index continues to increase – like during the depression), the final story seem likely to remain the same – an dramatic loss in wealth for the United States over the next few years. The only real question that now remains is identifying and implementing the best investment strategy to undertake given that we are on this brink. I hear there are great specials going on at MattressesAreUs.com.



The Film Industry’s Golden Rule – Part 2


This is very early, but nevertheless interesting and is based on the initial insights from the “Film Industry Executives Golden Rule – Total Gross is 3x Opening Box Office Receipts” post. As discussed, identifying outliers could be an important part in identifying characteristics for those exceptional films in the industry. The plot below show the number of outlying films (exceptional) where opening revenue was higher the 2.68 stdev (line with circles). In addition, the plot show (line with triangles) the number of outliers that also exceeded 4x Total Gross/Opening Gross ratio (industry average being 3.1).



The second group (triangles) is the candidate study group for any future project – e.g, a good place to look for characteristic differences between exceptional and average films. There appears to be thirty years of data to explore here; helpful for creating, testing, and scoring regression and logistical regression models.

However, the more interesting trends are the exponential increase in outlier opening gross revenue films (line with circles) and the divergence between the two. While I don’t know what to make of it yet, there appears to be something going on.

In order to systematically address these data science questions, any future engagement lifecycle needs to be run through an organic process in order to maximize the likelihood of success (coming up with actionable insights on budget and time). The key will most likely be access to film industry data sets, specifically those used to build web sites like Box Office Mojo. It would be useful to get detailed accounting for each film, inclusive of budgetary items (e.g., market spend). In addition, the project needs to pull in other third party data like regional/national economics (Bureau of Economic Analysis), Weather (Weather Underground), Social (FaceBook, Twitter), demographic/psychographic models, etc. Here is the macro model for deriving insights from ones and zeros:




The analysis process itself is driven by data aggregation, preparation, and design of experiments (DOE). Having access to a few big data tool smiths (data scientists that are Cloudera hackers) pays off at this phase. The data science team should set up a multi-node hadoop environment at the start for all the data that will be pulled in over time (potentially terabytes within 1 year). They should also not waste effort trying to force fit all the disparate data sources into some home grown relational data schema. Accept that fact that uncertainty exists and build a scalable storage model accessible by R/SPSS/etc. from the start.

Once the data is in hand, the fun process begins. While modeling is both a visual and design process, it is all driven through an effect design of experiment. Knowing how to separate data into modeling, test, and scoring is a science, so there is no real need to second guess what to do. Here is one such systematic and teachable process:




At the micro level (day to day), the team needs to build out an ecosystem to support data analytics and science. This includes tools (R, SPSS, Gephi, Mathematica, Matlab, SAS, Hanna, etc.), big data (Cloudera – Hadoop,  Flume, Hive, Mahout (important), Hbase, etc.), visualization (Rapha.ANkl, D3, Polymaps, OpenLayers, Tableau, etc.), computing (local desktops/servers, AWS, etc.), and potentially third party composite processing (Pneuron). Last, but not least, is an Insights Management Framework (dashboard driven application to manage an agile driven, client centric workflow). This will manage the resolution process around all questions developed with the client (buy or build this application).

While the entertainment industry is a really exciting opportunity, this enterprise-level data science (EDS) framework generalizes to all insights analyses across industries. By investing in the methodology (macro/micro) and infrastructure up front (hadoop, etc.), the valuation of data science teams will be driven through a more systematic monetization strategy build on insights analysis and reuse.

Film Industry Executives Golden Rule – Total Gross is 3x Opening Box Office Receipts

Golden rule entrepreneurship

The film entertainment industry believes that the total gross theater earnings from a film can be determined by looking at the opening gross box office receipts. Industry executives use the rule of thumb that for every dollar earned on opening day, three dollars will be earned in total from box office receipts (i.e., Total Gross = 3 x Open Gross). This is why they invest in all that marketing prior to opening day.

I decided to take a look at this rule of thumb, so I created an R script that pulled the required data from Box Office Mojo (see below). I grabbed all 14K+ films from BOM, did a bit of data cleaning and formatting, then plotted the relationship between Opening Box Office Receipts and Total Gross Theater Earning. As it turns out, the executives are right, the 2.5% to 97.5% confidence range for the golden ratio is 3.13 and 3.19, respectively. As a correlative predictive model, it is significant (R^2=.8034).  

2013 09 01 22 27 18

2013 09 01 22 27 02

R-SCRIPT (based on Tony Breyal Quick Scrape script)

2013 09 02 12 52 56

Objective-Based Data Monetization: A Enterprise Approach to Data Science (EDS)


Across all industries, companies are looking to Data Science for ways to grow revenue, improve margins, and increase market share. In doing so, many are at a tipping point for where and how to realize these value improvement objectives.

Those that see limited growth opportunities to grow through their traditional application and services portfolios may already be well underway in this data science transformation phase. For those that don’t see the need to find real value in their data and information assets (Data Monetization), it may be a competitively unavoidable risk that jeopardizes a business’s viability and solvency.

Either way, increasing the valuation of a company or business line through the conversion of its data and information assets into actionable outcome-oriented business insights is the single most important capability that will drive business transformation over the next decade.

2013 05 27 09 30 41

Data and information have become the single most important assets needed to fuel today’s transformational growth. Most organizations have seen the growth in revenue and margin plateau for organic products and services (those based on people, process, and technologies). The next generation of corporate value will come through the spelunking (exploration, evaluation, and visualization) enterprise, information technology, and social data sources.

“Data is the energy source of business transformation and Data Science is the engine for its delivery.”

This valuation process, however, is not without it challenges. While all data is important, not all data is of value. Data science provides a systematic process to identify and test critical hypotheses associated with increased valuation through data.

2013 05 27 09 36 09

Once validated, these hypotheses must be shown to actually create or foster value (Proof of Value – POVs). These POVs extract optical models from sampled data sets. Only these proven objective-oriented models, that have supported growth hypotheses, are extended into the enterprise (e.g., big data, data warehousing, business intelligence, etc.).

2013 05 27 09 32 46

The POV phase of value generation translates growth objective-based goals into model systems, from which value can be optimally obtained.

2013 05 27 09 40 18

This objective-based approach to data science different, but complements, traditional business intelligence programs. Data science driven actives are crucial for strategic transformations where one does not know what they don’t know. In essence, data science provide the revelations needed identify the value venues necessary for true business transformations.

2013 05 27 10 08 20

For those solutions that have clearly demonstrable value, the system models are scale into the enterprise. Unfortunately, this is where most IT-driven process start and often unsuccessfully finish. Enterprise data warehouses are created and big data farms are implemented, all before any sense of data value is identified and extracted (blue). Through these implementations, tradition descriptive statistics and BI reports are generated that tell us mostly things that we know we don’t know, an expensive investment in knowledge confirmation. The objective-based data monetization approach, however, incorporated only those information technology capabilities into the enterprise that are needed to support the scalability of the optimized solutions.

2013 05 27 09 40 59

While there are many Objective-Based Data Monetization case studies, a common use can be found in the insurance and reinsurance field. In this case, a leading global insurance and re-insurance company is facing significant competitive pricing and margin (combined ratio) pressure. While having extensive applications covering numerous markets, the business line data was not being effectively used to identify optimal price points across their portfolio of products.

Using Objective-Based Data Monetization, key pricing objectives are identified, along with critical causal-levers that impact the pricing value chain. Portfolio data and information assets are inventoried and assessed for their causality and correlative characteristics. Exploratory visualization maps are created that lead to the design and development of predictive models. These models are aggregated into complex solution spaces that then represents a comprehensive, cohesive pricing ecosystem. Using simulated annealing, optimal pricing structures are identified, which are implemented across their enterprise applications.

Data science is an proven means through which value can be created from existing assets in today’s organization. By focusing on an hypothesis-driven methodology that business objective outcome based, value identification and extraction can be maximized in order to prioritized the investment needed to realize them in the enterprise.

Refactoring Insurance/Reinsurance Catastrophe Modeling using Big Data

NewimageThe Catastrophe Modeling ecosystem, used in insurance and reinsurance, is a good example of the types of traditional computational platforms that are undergoing an assault from the exponential changes seen in data. Not only are commercially available simulation and modeling tools incapable of closing the forecasting capabilities gaps in the near future, but most organizations are not addressing the needed changes in the human factor (data scientists and functional behavioral analysts). The net for those insurance/reinsurance companies that rely on these old school techniques is 1) reduced accuracy in understanding physical effects of catastrophic events, 2) reduced precision in quantifying the direct and indirect cost of a catastrophe, and 3) increased blind spots for new and emergent catastrophic events, coming from combinations and permutations of existing events, as well as the creation of new ones.


The quadrafication of big data (infrastructure, tools, exploratory methods, and people) is having a positive impact on these kinds of ecosystems. I believe we can use the big data reference architecture as the basis for refactoring the traditional catastrophic simulation, modeling, and financial analysis activities. Using platforms like Pneuron, we can help them more effectively map computationally complex MDMI (multi-data mult-instruction) workstreams into disaggregated process maps functioning in a MapReduce format, potentially using some of the existing simulation models. They could get the benefit of their a prior knowledge (models, tools), while dealing with the growth in data sets. Just a few thoughts.

One last note – this is an exercise in science and not engineering, or even systems integration. The practices that make for excellent enterprise architectures, requirements development, or even software engineering are of very little use here (those beyond critical thinking). To solve this problem, one must be willing to fail, fail early, an fail often. It is only through these failures that the true realization of Big Data Cat Modeling capabilities will be found.

A Few Interesting Ways Big Data Is Being Used

Newimage4Here are five interesting uses of big data that are happening every day:

1. Google now studies the timing and location of search-engine queries to predict flu outbreaks and unemployment trends before official government statistics come out. Interesting.

2. Credit card companies routinely pore over vast quantities of census, financial and personal information to try to detect fraud and identify consumer purchasing trends. While not new, the big data approach is improving accuracy and precision, as well as speeding up prediction times.

3. Medical researchers sift through the health records of thousands of people to try to identify useful correlations between medical treatments and health outcomes. I wonder if the healthcare insurance industry is taking advantage of this?

4. Companies running social-networking websites conduct “data mining” studies on huge stores of personal information in attempts to identify subtle consumer preferences and craft better marketing strategies. This is a subset of the Target case study.

5. A new class of “geo-location” data is emerging that lets companies analyze mobile device data to make intriguing inferences about people’s lives and the economy. It turns out, for example, that the length of time that consumers are willing to travel to shopping malls—data gathered from tracking the location of people’s cell phones—is an excellent proxy for measuring consumer demand in the economy.

NewImageThese applications do beg the question about privacy, “When does now-casting – search through massive amounts of data to predict individual behavior – violate personal privacy?”