Predicting the Unobservable: Thoughts on Predicted vs. Observed Data

Dave Kelly Blog

For years, philosophers have argued the nature of reality. Empiricists like David Hume argued that all we know must be rooted in observable facts. Meanwhile, rationalists like Rene Descartes thought that an inkling of innate knowledge fueled our ability to understand the world around us.

I land in some place in the middle on this debate. My firm specializes in using analytics to predict future outcomes, such as ‘household spending on discretionary items’. We often enter the fray when customers ask about the role of predicted data in the online world, given that there is so much observed data.

While no one would likely doubt that it is often better to go with observed (aka ‘known’) data over predicted data, there is a powerful role for predicted data. In other words, the rationalist philosophers were onto something.

Predicting the Unobservable

The role of predictive data becomes intensely evident when trying to understand the ‘unobservable’. For example, one cannot observe each US household’s budget/capacity to pay for cruises over the next 12 months. You can observe propensity to go on certain cruises, but that is not the same thing as understanding total capacity. In short, it is impossible to observe a knowable outcome from the data at hand. We need to extrapolate based on other criteria in order to accurately predict which couples are ready to make a large ticket purchase like this.


One key challenge with observed data is scale and reach. While there is little doubt that known subscribers to a Porsche enthusiast website are great prospects for a Porsche dealer, the reality is that this will be an extremely small group. In the predicted world, we would clone this small group to a larger universe – resulting in perhaps millions of ‘clones’ of Porsche buyers. Another example is measuring network influencers. Data publishers frequently wish to sell to known, active users of Facebook, Twitter and other social sites. The reality is that these individuals only represent a subset of the total universe. A larger market exists, but there is no observable data marketers can use to find them unless a similar cloning exercise is used.


Observed data often comes with a hefty price tag. Niche data is often 2 to 10 times as expensive as comparable predicted data – meaning the subsequent performance needs to be much better. Being able to predict data is less expensive. Sorry Hume, but Descartes is simply a better bargain shopper.

Clearly, there is a role for observed data. The first step to understanding our world is to observe it. However, when the light dims and we need to figure out what lies in the shadows, we need to look for different strategies. Our recommendation is to actually utilize observed data when it is applicable and supplement it with predicted data. When many digital campaigns depend on millions of impressions to drive the expected results, a hybrid approach provides the best mix of power and scalability.