If you’ve been reading this blog for any amount of time or hanging around myself, Tom Webster, Jay Baer, and the many other numbers-focused folks in marketing, you’ve undoubtedly heard the expression, “correlation does not equal causation”. This is an axiom of basic statistical analysis, and if for some reason this is the first time you’re hearing it, then please go read this.
One of the assumptions that lots of folks (including myself) have at a certain point in statistics is that while correlation does not equal causation, causation cannot exist without correlation. However, it turns out that this isn’t true. Causation can exist without correlation!
How is this possible, when a relationship would seem to be mandatory in order for a causal relationship to be present? It’s deceptively simple, and boils down to how you select data. Let’s take a fictitious example; let’s say that I worked for an alcohol company, and I wanted to prove that alcohol does not cause motor vehicle fatalities. For clarity’s sake, neither are true – I don’t work for an alcohol company, and driving while intoxicated is blatantly unsafe. Don’t do it.
If I were to do a correlation of a random, representative sampling of people, some of whom drank alcohol and operated a vehicle unsafely, and some of whom did not, you would indeed see that there’s a strong relationship between alcohol consumption and vehicular fatalities. That would seem to indicate that correlation was mandatory in order for there to be causation.
But suppose I restricted my “study” to people who were, in my inexpert opinion, most likely to drive drunk. Suppose I focused it only on people who had 10 or more drinks per day? What you might find would be a negative correlation, that in fact, the more you drink, the less likely it is you’ll die from drunk driving, and therefore driving while drunk must be safe. What’s really happening among that population of super-heavy drinkers? They’re likely dying of causes other than drunk driving. At 10+ drinks a day, that’s not too hard to imagine.
The reality is that by selecting a population with no variation – that is, no one in the study did NOT drink – you can create distortions in your data that can “prove” your point, even though they’re statistically invalid. We know, beyond a shadow of a doubt, that drinking alcohol does cause an increase in vehicular deaths, but the data can be manipulated to “prove” otherwise.
While the above is an extreme example, there are plenty of times marketers make this mistake. Any time you do a survey or study of your customers, you are automatically reducing variation. You’re not surveying people who are NOT your customers. While surveying only your customers makes a great deal of sense if you want to understand how customers feel about your products or services, surveying only your customers to get a sense of the industry can create the same distortions as the alcohol and drunk driving example above. You’re only “proving” that your data has insufficient variation, and that there may be a very obvious causal relationship that you’re missing entirely.
Keep this example in mind as you read through surveys, infographics, etc. in the coming months. There will be a great deal of “marketers believe in 2015″ or “marketers found in 2014″ headlines – but check to see how the survey was taken. If it’s a survey of customers or someone’s email list, question the daylights out of it before you go believing it and making any changes to your business.
Want to read more like this from Christopher Penn? Get daily updates now: