Seven Laws of Data Science 01

I was re-reading the paper “Measuring The Value Of Information: An Asset Valuation Approach,” by Moody and Walsh (European Conference on Information Systems, 1999) when I realized just how powerful their approach was to data valuation. This is seminal research in the field of information theory and should be required reading for data scientists. Moody and Walsh recognized early on that information is the most valuable asset an organization has and that it is important to quantify this value through a formal methodology. While the paper lacks in defining a practical approach, the overall framework can be used as a basis for implementing a repeatable enterprise data valuation methodology.

The reason for this blog post, however, is in my desire to recast Moody and Walsh’s Seven Laws of Information. While they do not explicitly define information and how it is different from data, we can use the DIKW Pramid to recast a few of the laws more towards the field of data science. That is, the world is full of data, information is the relevant data, studying information gives knowledge, and reflecting on knowledge leads to wisdom. So, if we deconstruct the information laws and rethink their data equivalents, one might find these Seven Laws of Data Science as the result:

Law One: Data has value only if it is studies. Intrinsically, data does not generate residual value through its mere presence. Revelations can only be found in the exploration and study of data.

Law Two: The value of data increases with it use. As data is explored, combined with other data, and explored again, additional value is generated.

Law Three: Data can not be depleted through it use. Data is not a physical commodity that is subject the physical laws of entropy and subject to degradation. As such, data is infinitely reusable and through the exploratory processes will produce more data than that originally evaluated.

Law Four: Causal data is more valuable than correlative data. While correlative principle are very useful in some operational circumstances, to forecast the future one needs to truly understand causality within the system. Or, as someone more important than me has stated,  “Felix, qui potuit rerum cognoscere causes.” Translated, “Fortunate who was able to the know the causes of things.”

Law Five: The value from combined independent data is greater than the combined value of each data alone. This is equivalent to the whole is usually greater than the sum of the parts. That is, one plus one is greater than two.

Law Six: The value of data is perishable, while the data itself does not. The insights derived from the study of data have a limited value time horizon. 

Law Seven: More data does not necessarily lead to more value. Studies have shown that more data does not necessarily increase the accuracy of our predictions, just our confidence in those predications.

So this is the first cut the Laws of Data Science. What is missing, needs to be rethought through, or even deleted. Let me know.



    • Yes, I have thought about a way to check data quality using data science. I call it Data IQ… it uses mutual information theory to create a singular value of information value. More later…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.