I’m sure you’ve all heard about the party game Chinese Whispers, where a message is given to a person at one end of a line and is then whispered to the next, then the next, and so on until it reaches its eventual destination at the other end. Typically, small changes in the message occur at each stopping point until the end message is something that bears little or no resemblance to what it started life as.
World War One gave us a staggering real-world example of what can happen when a message suffers from Chinese Whispers. A message sent from the trenches to British headquarters started as:
Send reinforcements, we’re going to advance
By the time the message had reached HQ it had become:
Send three and fourpence, we're going to a dance
We can laugh about it now, but I can’t help but wonder just how many lives could have been saved if the message had reached HQ unmolested.
This post may contain affiliate links. This means that if you click on one of the links and make a purchase we may receive a small commission.
So what has this got to do with data integrity, I hear you ask. Well, if you work with shared datasets, your data can suffer the same fate as the British message in the trenches. Over time, as your dataset is passed around, small changes and errors introduced – accidentally or otherwise – can kill the accuracy of your data, and what started out as a perfectly reasonable dataset is now not fit for purpose.
In this blog post, we’re going to take a look at data integrity in the context of shared data to see if we can introduce procedures that will mitigate against Chinese Whispers, and I’ve also interviewed a few experts in the field to get their take on things.
What is Data Integrity?
The principle of data integrity is that data should be recorded exactly as intended, and when later retrieved, is the same as when it was recorded.
To do this, any data handling procedures we put in place must ensure the accuracy, reliability and consistency of data over its entire life cycle.
As an attempt to ensure integrity in its data, the FDA uses the acronym ALCOA to define data integrity standards, where data is expected to be:
This blog post is a summary. If you're enjoying it (and I hope you are), then hop on over to the full article where you can view it in all its glory:
Products from Amazon.com
- Price: Check on Amazon
- Price: Check on Amazon
- Price: $31.72Was: $49.99
- Price: $9.86Was: $17.00
- Price: $60.00
- Price: $15.92Was: $28.99
- Price: $10.87Was: $16.00