How many times have you heard that ‘correlation is not causation’? Many times, I’m sure. So you know not to say things like ‘wow, the correlation between A and B has a p-value of 0.000001 so A must be causing B…’. I wish I had a pound for every time a very experienced, intelligent scientist with PhDs and titles and stuff has said this.
“Yeah, but just because you’ve got strong evidence of a correlation…” I say, “…it doesn’t mean that A causes B”.
“But look at that small p-value”, they say, “surely when the p-value is so small, then A must be causing B…”.
Well, no. Not at all.
You see, when it comes to storytelling, we have a problem.
It’s not our fault though – as human beings we are hard-wired from birth to look for patterns and explain why they happen. This problem doesn’t go away when we grow up though, it becomes worse the more intelligent we think we are. We convince ourselves that now we are older, wiser, smarter, that our conclusions are closer to the mark than when we were younger.
The smarter we think we are the more likely we are to try putting an explanation to a pattern that we see, even when we don’t have enough information to reach such a conclusion. We can’t help it.
This is the thing about being human. We seek explanation for the events that happen around us. If something defies logic, we try to find a reason why it might make sense. If something doesn’t add up, we make it up.
This reminds me of a rookie error I made a few years ago with the results of some analyses I'd done.
I was doing a survival audit of breast cancer patients and trying to figure out which variables were correlated. The details aren't important.
After a few weeks of analysis I was digging deeper and deeper into the dataset, getting results that I expected to see, but one in particular leapt out at me. I discovered that patients that were receiving chemotherapy for their breast cancer had a much worse survival rate than patients that were not receiving chemotherapy.
This result slapped me in the face like a wet kipper and I exclaimed out loud 'Oh my God, the chemo is killing the patients!'.
I felt like running out into the clinic and screaming at the doctors to stop chemo treatments immediately. Fortunately, I took a deep breath, thought about it for a moment, then slapped my forehead and exclaimed 'Doh!'.
I had made 2 rookie mistakes at the same time.
The first mistake is what this blog post is all about - that correlation does not necessarily imply causation.
Just because there is a correlation between chemotherapy and poor survival, it does not necessarily follow that chemotherapy is the cause of the prognosis.
But actually there was a causal link between chemotherapy and poor prognosis, but I'd got it the wrong way round - it was the poor prognosis that was the cause of the chemotherapy. The patients with the more aggressive breast cancers were given more aggressive treatment, hence the chemotherapy, whereas patients that had less aggressive cancers didn't need chemo - they received alternative treatments.
So you see, even experienced analysts make mistakes when it comes to correlation and causation. I fell into the trap of Wrong Direction Causation.
Actually, if we uncover a correlation between A and B, there are five alternatives to A being the direct cause of B:
- Wrong Direction Causation
- The Third Cause Fallacy
- Indirect Causation
- Cyclic Causation
- Coincidental Causation
I explain all about these alternatives in a FREE book Correlation Is Not Causation.
In this book, you'll learn:
- 5 reasons you should be sceptical about your correlation
- The alternatives as to why correlation is not necessarily causation
- How to avoid falling into these traps
You can get your copy right here: