Ever heard the Latin expression Post Hoc, Ergo Propter Hoc, meaning ‘After this, therefore because of this’? It is the basis of the saying ‘Correlation Is Not Causation’, also known in statistics as the Post Hoc Fallacy because it’s a very familiar trap that we all fall into from time to time. This is the idea that when things are observed to happen in sequence, we infer that the thing that happened first must have caused the thing that happened next.
It’s not our fault though – as human beings we are hard-wired from birth to look for patterns and explain why they happen. This problem doesn’t go away when we grow up though, it becomes worse the more intelligent we think we are. We convince ourselves that now we are older, wiser, smarter, that our conclusions are closer to the mark than when we were younger (the faster the wind blows the faster the windmill blades turn, not the other way around).
Even really smart people see a pattern and insist on putting an explanation to it, even when they don’t have enough information to reach such a conclusion. They can’t help it.
This is the thing about being human. We seek explanation for the events that happen around us. If something defies logic, we try to find a reason why it might make sense. If something doesn’t add up, we make it up.
Correlation Is Not Causation - aka The Post Hoc Fallacy
The Post Hoc Fallacy is what causes a football manager to only wear purple socks on match days. He once wore them at a match and his team won. Obviously, it was the socks that did it. Now he fears that if doesn’t wear them to a match the team might lose. Damn those stinky purple socks (he also daren’t wash them for fear of the magic pixie dust washing out).
Post Hoc is also what made rain men indispensible to the tribe – they believed that their rain man can make it rain. Spotting the clouds brewing in the distance, the rain man dances until it pours it down. It doesn’t usually take more than three or four days of dancing until the inevitable happens. “Rain man dance, water fall from sky”. It’s just a good job for the rain man that the Indians couldn’t speak Latin, otherwise he’d have been in real trouble…
Correlation Is Not Causation and the Post Hoc Pirates!
For a humorous view of the Post Hoc Fallacy, let’s take a look at Pastafarianism. It’s all the rage these days. Not heard of it? It’s one of the newest and fastest growing religions on the block. Pastafarian Sparrowism, to give it its full title, is a ‘vibrant religion that seeks to bring the Flying Spaghetti Monster’s fleeting affection to all of us, through the life of His Prophet, Captain Jack Sparrow’. Seriously, they’re not joking. Well, actually, they are. They promote a light-hearted view of religion and oppose the teaching of intelligent design and creationism in public schools. They also maintain that pirates are the original Pastafarians.
In an effort to illustrate that correlation is not causation, the founder, Bobby Henderson, presented the argument that global warming is a direct effect of the shrinking number of pirates since the 1800s, and accompanied it with this graph:
Wow, look at that straight line, I hear you all say – there’s clearly a correlation between the decline in the numbers of pirates and the rise in global temperatures, so there just must be a causal connection here, mustn’t there? Yup, you’ve all fallen for the Post Hoc Fallacy (I just knew you would).
Just because there is a straight line on the graph it doesn’t necessarily follow that one thing caused the other, particularly when you’ve grabbed two seemingly unconnected variables at random and stuck them together to see whether there might be some sort of tenuous correlation between them.
In the case of pirates and global warming, take a closer look at the labels on the x-axis. Notice something strange? Apart from the fact that the proportions of neighbouring data points are all out of whack, there is also the issue that a couple of them have been humorously disordered to deliberately deceive.
I don’t know about you, but I’m a believer! As soon as I’ve finished writing this book I’m giving up stats for a life as a pirate on the open seas. I’ll stop global warming if it’s the last thing I do.
It probably will be…
This blog post is an accompaniment to the FREE eBook Truth, Lies & Statistics (and there are more resources below).
You can get your copy right here:
The Organic Autism Correlation Conundrum
If you look online there are all sorts of humorous graphs that prove the Post Hoc Fallacy. Over the past 20 years or so, there’s been a huge increase in the anti-vaccine movement, particularly in the US, and there have been all sorts of spurious correlations that have been ‘discovered’ that ‘prove’ that there is a causal link between vaccination programmes and autism. At the same time, to debunk the most crackpot of the theories, other – equally ridiculous – correlations have popped up too.
There was one that was published that showed the correlation between sales of organic food in the US and diagnosis of autism:
There is a very close correlation between the pair of plot lines, even accompanied by a very large r-value (close to 1) and a very small p-value (close to 0). The suggestion is that – if we trust that correlation does imply causation – a much closer correlation exists between organic food and autism than any other theory that currently exists, so therefore it must be the cause. Except that correlation does not necessarily imply causation, and organic food does not cause autism. That would be ridiculous. And that is the whole point of these graphs. All you need to do is find any pair of variables that increase over the same time period, plot them on a graph with the same x-axis and different y-axes, adjust the y-axis scales until the plot lines coalesce, and – BOOM – correlation! If, by some magic of coincidence and fate, there is a statistical correlation, then publish the p-value that goes along with it as additional proof. What this does is prove that the correlation exists, but it does not prove that one thing causes the other. It might, but then again it might not…
The Lemon Fatality Correlation Convergence
I also quite enjoyed the correlation that proved that Mexican lemons are a major cause of deaths on US roads. Wait, what? I must have missed the news that day – Mexican lemons are killing Americans? You bet!
Take a look at a plot of the number of fresh lemons imported into the USA from Mexico versus the total fatality rate on US highways between 1996 and 2000:
My, my, just look at the R2 value – it really must be true. Although the graph seems to be telling us that the more Mexican lemons there are in the US the fewer road deaths there are, the inescapable conclusion is that MEXICAN LEMONS KILL AMERICANS! What should we do about it? Should we import more Mexican lemons (the correlation tells us that this is what we should do)? Or should we ban Mexican lemons altogether? After all, if there are no Mexican lemons on the streets then they can’t kill any more Americans.
What utter tosh! I don’t care if there is a correlation, there is nothing to suggest that lemons cause accidents. If there was, don’t you think that lemons would be causing accidents on Mexican roads before the trucks crossed into the US? What about Sicilian lemons? Do they cause road deaths in Italy and across Europe?
Correlation vs Causation "correlation proves that Mexican lemons are a major cause of deaths on US roads" @eelrekab #analysis #statistics
Oh, the power of correlations. As long as your audience doesn’t understand that correlation is not causation you can make them believe pretty much anything.
Let's get this sorted now. Repeat after me:
- Correlation is not causation
- Correlation is not causation
- Correlation Is Not Causation !!!
Resources on Lying With Data
This blog post is an accompaniment to the FREE eBook Truth, Lies & Statistics, and is here to help you take the next steps.
Below you'll find the best resources on learning about lying with data that we've found on the web, and we update it frequently with new books, video courses, software and whatever else we can find (and create ourselves), so feel free to bookmark us, share us on the web and call in regularly to top up your data lying ninja skills.
Disclosure: This post may contain affiliate links. This means that if you click one of the links and make a purchase we may receive a small commission at no extra cost to you.
You can find further details in our TCs
Video Courses in The Hive
The Hive is our online learning portal where you can find courses on data analysis, statistics and machine learning.
Unlike other online course platforms, we actively encourage networking and collaboration, and The Hive is a place where you can chat and make friends in our exclusive members-only built-in social media community. You can also contact me and tell me what courses you need - I am in the habit of taking requests and creating personalised video courses!
In The Hive there are plenty of FREE courses (get a free plan to access these), and you can access the most in-depth courses for a small monthly or annual subscription (or purchase courses individually). Better still, each of the premium courses has a shortened free version so you can try-before-you-buy.
If you want access to exclusive, personalised content, then The Hive is the place to be!
Udemy Video Courses
Udemy is a great place to learn new stuff, not just about data, stats and AI, but about making model trains, how to apply make-up and, oh, just about anything else you can think of.
Courses (when they're on sale, which is very often) are typically priced at about 10-15 £/$/Euro. The upside is that the courses are very cheap, and usually very good. The downside is that courses aren't part of any formal programme, so you won't get any kind of certification.
If you want to fill gaps in your education or even learn whole topics, then Udemy is a great place to go.
*NOTE - the prices listed below are the full price, and are not automatically updated when a sale is on. If you want to find out the sale price, just click through!