How many times have you heard that ‘correlation is not causation’? Many times, I’m sure. So you know not to say things like ‘wow, the correlation between A and B has a p-value of 0.000001 so A must be causing B…’. I wish I had a pound for every time a very experienced, intelligent scientist with PhDs and titles and stuff has said this.
“Yeah, but just because you’ve got strong evidence of a correlation…” I say, “…it doesn’t mean that A causes B”.
“But look at that small p-value”, they say, “surely when the p-value is so small, then A must be causing B…”.
Well, no. Not at all.
Disclosure: This post may contain affiliate links. This means that if you click one of the links and make a purchase we may receive a small commission at no extra cost to you.
You can find further details in our TCs
You see, when it comes to storytelling, we have a problem.
It’s not our fault though – as human beings we are hard-wired from birth to look for patterns and explain why they happen. This problem doesn’t go away when we grow up though, it becomes worse the more intelligent we think we are. We convince ourselves that now we are older, wiser, smarter, that our conclusions are closer to the mark than when we were younger.
The smarter we think we are the more likely we are to try putting an explanation to a pattern that we see, even when we don’t have enough information to reach such a conclusion. We can’t help it.
This is the thing about being human. We seek explanation for the events that happen around us. If something defies logic, we try to find a reason why it might make sense. If something doesn’t add up, we make it up.
Correlation vs Causation: "when A correlates to B, there are five alternatives to A being the direct cause of B" @eelrekab #analysis #statistics
My Correlation - Causation Mistake
This reminds me of a rookie error I made a few years ago with the results of some analyses I'd done.
I was doing a survival audit of breast cancer patients and trying to figure out which variables were correlated. The details aren't important.
After a few weeks of analysis I was digging deeper and deeper into the dataset, getting results that I expected to see, but one in particular leapt out at me. I discovered that patients that were receiving chemotherapy for their breast cancer had a much worse survival rate than patients that were not receiving chemotherapy.
This result slapped me in the face like a wet kipper and I exclaimed out loud 'Oh my God, the chemo is killing the patients!'.
I felt like running out into the clinic and screaming at the doctors to stop chemo treatments immediately. Fortunately, I took a deep breath, thought about it for a moment, then slapped my forehead and exclaimed 'Doh!'.
I had made 2 rookie mistakes at the same time.
The first mistake is what this blog post is all about - that correlation does not necessarily imply causation.
Just because there is a correlation between chemotherapy and poor survival, it does not necessarily follow that chemotherapy is the cause of the prognosis.
But actually there was a causal link between chemotherapy and poor prognosis, but I'd got it the wrong way round - it was the poor prognosis that was the cause of the chemotherapy. The patients with the more aggressive breast cancers were given more aggressive treatment, hence the chemotherapy, whereas patients that had less aggressive cancers didn't need chemo - they received alternative treatments.
If you need help with your Statistics, I've put together a list of great Statistics books you can get in Amazon. Check them out!
5 Correlation - Causation Alternatives
So you see, even experienced analysts make mistakes when it comes to correlation and causation. I fell into the trap of Wrong Direction Causation.
Actually, if we uncover a correlation between A and B, there are five alternatives to A being the direct cause of B:
I explain all about these alternatives in a FREE book Correlation Is Not Causation.
In this book, you'll learn:
You can get your copy right here:
Correlation - Causation Resources
This blog post is an accompaniment to the FREE eBook Correlation is Not Causation, and is here to help you take the next steps.
Below you'll find the best resources on learning about correlations that we've found on the web, and we update it frequently with new books, video courses, software and whatever else we can find (and create ourselves), so feel free to bookmark us, share us on the web and call in regularly to top up your correlation ninja skills.
Video Courses in The Hive
The Hive is our online learning portal where you can find courses on data analysis, statistics and machine learning.
Unlike other online course platforms, we actively encourage networking and collaboration, and The Hive is a place where you can chat and make friends in our exclusive members-only built-in social media community. You can also contact me and tell me what courses you need - I am in the habit of taking requests and creating personalised video courses!
In The Hive there are plenty of FREE courses (get a free plan to access these), and you can access the most in-depth courses for a small monthly or annual subscription (or purchase courses individually). Better still, each of the premium courses has a shortened free version so you can try-before-you-buy.
If you want access to exclusive, personalised content, then The Hive is the place to be!
Udemy Video Courses
Udemy is a great place to learn new stuff, not just about data, stats and AI, but about making model trains, how to apply make-up and, oh, just about anything else you can think of.
Courses (when they're on sale, which is very often) are typically priced at about 10-15 £/$/Euro. The upside is that the courses are very cheap, and usually very good. The downside is that courses aren't part of any formal programme, so you won't get any kind of certification.
If you want to fill gaps in your education or even learn whole topics, then Udemy is a great place to go.
*NOTE - the prices listed below are the full price, and are not automatically updated when a sale is on. If you want to find out the sale price, just click through!
Coursera Video Courses
Coursera offer a more considered approach to learning and offer individual and full degree courses, and you get certificates too that you can display on your LinkedIn profile to impress your future boss.
Prices are usually around the 50 £/$/Euro mark per course.
The great thing about Coursera is that all courses are taught by University professionals, so you know that these guys are the best and brightest in their field. On the downside, these courses are quite intensive and you need to be able to set aside a fair chunk of your time over a few weeks to complete the course. Unless you enjoy pulling all-nighters...
CorrelViz - All The Correlations in Your Data In Minutes, Not Months
Discover new insights - Save time and money