Data Integrity – Don’t Let Chinese Whispers Kill Your Data

​I’m sure you’ve all heard about the party game Chinese Whispers, where a message is given to a person at one end of a line and is then whispered to the next, then the next, and so on until it reaches its eventual destination at the other end. Typically, small changes in the message occur at each stopping point until the end message is something that bears little or no resemblance to what it started life as.

World War One gave us a staggering real-world example of what can happen when a message suffers from Chinese Whispers. A message sent from the trenches to British headquarters started as:

Send reinforcements, we’re going to advance

By the time the message had reached HQ it had become:

Send three and fourpence, we're going to a dance

We can laugh about it now, but I can’t help but wonder just how many lives could have been saved if the message had reached HQ unmolested.

Data Integrity - Don't Let Chinese Whispers Kill Your Data

More...

This post may contain affiliate links. This means that if you click on one of the links and make a purchase we may receive a small commission.

So what has this got to do with data integrity, I hear you ask. Well, if you work with shared datasets, your data can suffer the same fate as the British message in the trenches. Over time, as your dataset is passed around, small changes and errors introduced – accidentally or otherwise – can kill the accuracy of your data, and what started out as a perfectly reasonable dataset is now not fit for purpose.

In this blog post, we’re going to take a look at data integrity in the context of shared data to see if we can introduce procedures that will mitigate against Chinese Whispers, and I’ve also interviewed a few experts in the field to get their take on things.


What is Data Integrity?

The principle of data integrity is that data should be recorded exactly as intended, and when later retrieved, is the same as when it was recorded.

Spot The Difference

To do this, any data handling procedures we put in place must ensure the accuracy, reliability and consistency of data over its entire life cycle.

As an attempt to ensure integrity in its data, the FDA uses the acronym ALCOA to define data integrity standards, where data is expected to be:

  • Attributable – Data should clearly demonstrate who observed and recorded it, when it was observed and recorded, and who or what it is about
  • Legible – Data should be easy to understand, recorded permanently and original entries should be preserved
  • Contemporaneous – Data should be recorded at the same time as it was observed
  • Original – Source data should be accessible and preserved in its original form
  • Accurate – Data should be free from errors

This blog post is a summar​y. If you're enjoying ​it (and I hope you are), then hop on over to the full article where you can view it in all its glory:

Data Integrity - Don't Let Chinese Whispers Kill Your Data

Data Integrity - Don't Let Chinese Whispers Kill Your Data