These days, data is everywhere, and we all have to deal with it from time to time. It doesn’t matter whether you’re a scientist or an entrepreneur, in academia or in business, if you’re collecting data to try to answer some questions then you need to understand the fundamentals.
Sooner or later, the issue of data collection will come up and you’re left wondering what data to collect, how to collect it, how to store it and how to prepare it for analysis.
You might have heard this already, but 80% of a data analyst’s time is spent cleaning and pre-processing data.
But you might not know that 80% of data cleaning problems are caused by poor data collection techniques.
Here’s where good data collection techniques come to your rescue.
Understanding why you’re collecting data and the impact that will have on the world will help you build a data collection plan. In turn that will help you to understand which data to collect and how to collect it.
Pin it for later
Need to save this for later?
Pin it to your favourite board and you can get back to it when you're ready.
This blog post deals with the best data collection techniques and good data management practices you'll find anywhere, and will give you some great tips on how to get organised and avoid making the kind of mistakes that will make you prematurely grey!
The textbooks tend not to dwell on the practical issues of data collection because, well, to be honest, it can get quite messy. Nonetheless, these are vitally important steps to get the most out of your data, and if you want to collect data properly you really do need a good set of data collection methods in your armoury.
This blog post has two halves.
In the first half we’ll deal with the ‘what’ and the ‘why’ of data collection and answer some of the most frequently asked questions about data collection.
In the second half we’ll move on to the ‘how’ of data collection and give you some of the best and most important data collection techniques that you’ll need.
In other posts I deal with other important issues such as data cleaning, data types and data coding and classification, but for the moment let’s focus on the most important – your data collection techniques. Mess this one up and there’s no point moving on to the others, so focus guys!
So let’s rewind to the beginning and see what we can do to get you off to a good start...
Why Is Data So Important?
If you’re in business, advertising or marketing and your answer to the question of why data is important is ‘to make more money’, then you’re on the wrong track.
It doesn’t matter which sector you’re in or what type of research you’re conducting, the biggest reason why data is so important is to improve people’s lives.
Data is knowledge, and knowledge is power. By measuring and collecting data – and analysing them correctly – you can make informed decisions about what you want your future to look like, and then build strategies on how to get there.
When you’re collecting data and analysing it, the end goal is to make a real, measurable impact on the world, and to do that you go through an expanded DIKW hierarchy (Data, Information, Knowledge, Wisdom), a process of:
- 1Data: Collecting data
- 2Information: Analysing the data to convert it into information
- 3Knowledge: Interpreting the information to gain knowledge
- 4Insight: Make connections in the information to gain insights
- 5Wisdom: Attain the wisdom to appreciate why the insights are important
- 6Impact: Use the insights to make a bigger impact in the world
Data tells you what you’re doing well, so you can replicate your successes, and tells you what you’re not doing well so you can find solutions to your problems and make adjustments.
This is as true in business as it is in healthcare, engineering, social science or marketing.
If you’re not collecting data, you’re standing still. And since your competitors are collecting data, standing still means going backwards.
What Is The Difference Between Data And Information?
Data is a collection of measurements or observations that represent the real world. On its own, data is useless. It just sits there and stares back at you, daring you to blink first. What you do with the data is important.
By interrogating the data and trying to understand it, you can transform the data into information that you can use to understand the world around you – and use it to predict the future.
Information is your interpretation of the underlying story of the data. If you have collected data carefully with a view to accurately representing the real world, analysed it appropriately using the right tools and interpreted the results correctly, then the story you tell will likely have a strong basis in truth.
The world is yours and there is no stopping you!
Why Do We Collect Data?
If you’re a nose-to-the-grindstone type of person, you probably think that you collect data to:
These are all correct answers, but if you’re a bigger picture type, then you probably consider that you collect data to:
Whichever type of person you are, and whatever route you take, the data collection process starts with a question and ends with the actual data collection part itself.
What is the outcome you seek, and what does this ‘better world’ look like? What data do you need to test to reach that outcome? Are you currently collecting data – the correct data? If not, how do you plan to collect that data?
It is only after you have answered these questions that you can begin the data collection process, and even then you will still need to build a data collection policy and address issues such as data collection procedures, protocol standardisation, data quality assurance, data integrity, data storage and security.
And then comes data cleaning. Oh my, that’s a whole other ball game!
What Are The Benefits Of Data Collection?
If you want to improve the lives of your [customers/patients/users], *spoiler alert – you do!*, then you need to know what they want and need. If you haven’t asked them yet, now would be a really good time to start!
Collecting data based on what your end users need allows you to get a deeper understanding of them and of your market, and when you know what they want you can figure out how to give it to them.
Better still, the more you understand about your end users, the more you can personalise your offerings to them. In the marketing world this is called customer segmentation, whilst in healthcare it’s called personalised medicine.
- 1Ask how you can help (collect data)
- 2Personalise the solution (create the perfect product for each user)
- 3Deliver what they need (delight your users)
Ultimately, it’s about meeting the end user’s expectations, and if you can do that you’re much more likely to close the deal. And get a repeat deal. And another, and another…
It’s a win for everybody!
What Is Data Collection In Quantitative Research?
Data collection is the process by which you gather and measure information in a systematic fashion about specific variables that allow you to answer the stated research questions, test hypotheses and evaluate outcomes to get a complete and accurate picture of an area of interest.
For example, if you have a hypothesis that Weight is affected by Household Income, then you will need to collect data on Weight and Household Income from a small sample of the population.
Of course, that leads to many, many more questions, such as:
In other words, before you start collecting data you need a data collection plan!
What Is The Purpose Of A Data Collection Plan?
A data collection plan helps to ensure that the data you collect are useful, sufficiently accurate and fit-for-purpose.
OK, what do we mean by useful?
The data you collect must be capable of answering your hypothesis. If you’re doing a study on Weight and Household Income, then all the data you collect should be in service of that hypothesis. Collecting data on whether your study participants swim regularly might be useful, but collecting data on their shoe size probably isn’t. Data collection best practices dictate that you should only collect the data you need to answer your hypothesis.
What about sufficiently accurate?
How should you collect data on your Weight variable? If you collect it in kilograms, to how many decimal places should you measure? If you decide to collect Weight data in categories, then how many categories should you choose? And where should the boundary be between, for example, Overweight and Obese? Are there standards for such measurements? How many different standards are there? Do you know which ones are most appropriate for your study?
And are your data fit-for-purpose?
Just because you’ve collected data, it doesn’t mean they are immediately usable. Have your data been collected according to data collection best practices and consistent standards? Are your data sufficiently complete? Are your data biased? Do your data contain outliers?
All these things can get complicated really quickly, and is why you need to follow data collection best practices and build a data collection plan that is deeply thought through well in advance of collecting data.
What Are The 5 Methods Of Data Collection?
There are five data collection methods in quantitative research:
In surveys and questionnaires you ask carefully planned questions so that the responses are capable of answering your research hypothesis. Typically, respondents fill in a data collection questionnaire form, and their responses may be closed-ended (rating scales, check-boxes and multiple choice) or open-ended (free text).
With interviews, the researcher will typically have a list of pre-prepared questions to ask, but may also deviate and ask follow-up questions based on responses in real time. Interviews are more customisable than surveys and questionnaires, but can be much more expensive to conduct. Also, the interviewers need to have a lot of data collection experience, otherwise the data collected may end up being useless, with a lot of wasted time and money. Data collection skills are critical to success here!
Direct observation involves collecting information without asking questions. One of the most obvious examples of an observational study is a time-and-motion study, which typically evaluates the efficiency of operations. In this data collection methodology, an observer will take a passive role and take notes on their observations.
The focus group data collection method is a combination of interviewing, surveying and observing, with the difference that it is not one-to-one, but is instead a group discussion. Focus groups often use open-ended questions, such as “How did you feel about…” or “What did you like best about…”, and the responses are more about the shared experience than those of any individual.
Existing documents and records are a treasure-trove of information, and you can collect a considerable amount of data without interviewing participants or building focus groups. This can be a very efficient and inexpensive data collection method because you’re using research that has already been completed.
What Is The Difference Between Primary Data And Secondary Data?
Primary data is real-time data that is collected by researchers directly from main sources in prospective studies, while secondary data relates to the past and is data that was previously collected and made available for researchers to use for their own retrospective research studies.
Primary data is often collected to address a particular research problem or hypothesis. An example of primary data is the national census collected by the government to give a head count of everyone in the country on a given day.
Primary data is specific to the needs of the researcher and is usually more accurate and/or up-to-date than secondary data, but it can be very time-consuming and expensive to collect.
Secondary data is data that is easily accessible to researchers and means that they can check hypotheses quickly and cheaply. An example of secondary data is investigating census data from previous years.
Secondary data is more accessible than primary data and is quicker and cheaper to access, but it can be less accurate and outdated.
11 Cool Data Collection Techniques in Quantitative Research - Can you name them all? #datacollection #statistics #smalldata @chi2innovations
Data Collection Techniques
So now we’re getting to the business end of the post. Here you’ll find our top 11 data cleaning techniques in quantitative research, a data collection tips list – our pick of the best do's and don'ts of data collection you’re likely to find anywhere.
I've put together an infographic on the most important 11 data collection techniques that might help you a little:
And now you have a choice.
You can either go through the slides or follow the text below - or both! Either way will give you the info you need on the best 11 data collection techniques you can find...
Tip 1: Record Data on Paper First…
So you’ve got your hypothesis (theory, idea or hunch). Once you’ve decided what data you need to collect, the first thing you should do is design a paper-based data collection form to store all your data (assuming that at least some of your data is going to be recorded by hand).
Keep it simple, print it out, then manually record your data with pen and paper. One form per case/patient/customer/test-tube, etc..
Tip 2: Then Transfer it to an Electronic Medium
We may be living in an electronic world, but ultimately you need a system where you (or anyone else) can follow the data trail from beginning to end and – more crucially – from end to beginning.
From time to time you WILL make a mistake with the data, so it is vitally important that you have a data collection method that will let you spot and rectify the mistake by going back through all the steps until you find the error.
So now you have your data recorded on paper you need to transfer it into an electronic system. More than likely this will be either Microsoft Excel or Access.
In general, Excel is more common and easier to use, and has the added advantage that you can manipulate the data and do some simple analyses right there without having to export your data.
Most data is stored in Excel (in 7 years as a medical statistician I was only once given data in Access – all the other times it was in Excel), so we’ll go with that from here on in…
Tip 3: Enter Your Data on a Single Worksheet Whenever Possible
Trying to sort your data when it is spread across multiple worksheets can lead to all sorts of problems, so try to avoid it whenever you can – keep your data on a single worksheet.
Excel 2003 has limits of 65,536 rows by 256 columns. That’s large enough for most datasets, but if you need higher limits you can use Excel 2010 or 2013 (1,048,576 rows by 16,384 columns).
Tip 4: Use a Unique ID column
You’ll likely have to sort your data many times and by different columns, so you’re going to need a way of restoring the original order.
Use column A as a unique identifier to insert consecutive numbers starting from 1. It may be simple, but it’s very effective.
When you’ve put your Unique IDs into column A, go back to your original paper sheets and write the Unique ID there as well. Trust me – you’ll thank me for this tip later…
Tip 5: One Column per Variable
Each variable should have… oh, hold on a minute, what’s a variable?
Well, simply put, these are the things that can change or can be changed as part of your study. In short, these are all the pieces of information that you are observing, measuring, counting and collecting, like age, gender, distance, temperature, etc..
You can find more information in my blog on different data types.
Where were we? Ah yes…
Each variable should have its own column, and each variable should correspond to just one piece of information.
If you’re entering the age of a patient, then just enter their age, don’t enter their date of birth in the same column or cell. If you want to record their age and DOB, then use 2 separate columns.
If you’re recording a composite variable made up of 2 or more constituent parts, like Body Mass Index – made up of Height and Weight – then record them in separate columns. You can always combine them into a single variable later.
Data Collection: "mess this up and your dataset is going to the great Data Graveyard In The Sky!" #gooddata #datacollection #smalldata @chi2innovations
Tip 6: Row 1 is the Variable Name
Eventually you’ll need to analyse your data and you may need to export it to a statistical program.
The standard for pretty much all commercial stats programs is for the first row to be reserved for the name of the variable and all other rows for the data. So don’t be tempted to use rows 2, 3 and 4 as well as row 1 for the variable name. It might keep everything looking nice and tidy in Excel, but it will only create more work for you later.
Tip 7: Every Cell Should Have Something In It
What do empty cells tell you?
An empty cell is just a great big question mark and tells you nothing.
Worse still, incomplete datasets give reviewers a reason to whack you about the head with a metaphorical stick (and believe me they will – I’ve been there many times…).
So make sure that something is entered in every cell.
It is quite common to use ‘illegal’ numbers as codes to give you information, so where the entries for a variable can only be positive values (like age or height), we can use codes such as 1, 2, 3, etc..
If negative numbers aren’t useful, then use letters a, b, c, etc..
If you’re not comfortable entering something in cells that strictly shouldn’t be there (after all, you are going to have to clean them up later before you can analyse your data), then use Excel’s Comment feature. I tend to use this sparingly, but that’s just me.
Tip 8: Keep Great Notes
When using codes you’ll need to keep notes to tell you what the codes mean. Keep the codes and notes in a different spreadsheet.
While we’re on the subject, it’s really important to
KEEP GREAT NOTES !!!
You’re likely not the only person that will ever work with this dataset, so get used to writing stuff down.
Explain what the project is all about, the question you’re trying to answer, why you’re collecting this data and how you’re going to get the answers you’re looking for. Explain how you measured things and under what conditions. If more than one person is collecting data, then explain who, what, where, when, why and how.
This will be the document that explains all the important stuff about your dataset, so write it down.
If there’s too much information to comfortably put into an Excel spreadsheet, then a Microsoft Word document will be just fine – and keep it in the same folder as the dataset.
Tip 9: Be Consistent With Data Entry
There’s nothing worse than getting a dataset that takes a fortnight to clean because data entry has not been consistent.
By that I mean make sure that if the entry for a variable should be ‘Positive’, then make it ‘Positive’ and not some other variation (pos, Pos, pos+, etc.).
It’s hard enough correcting speeling missteakes and typos without also having to correct things that were deliberately entered differently.
One of the biggest data collection challenges you will face is inconsistent entry, so restrict the number of people that can enter data to cut down on this kind of issue, and make it clear what your data collection procedures and data entry standards are.
Tip 10: Don’t Guess
Data should be entered as accurately as possible.
Don’t guess, approximate, round up or down.
Enter the value exactly as registered on paper.
If you need the data to be rounded up or down you can use Excel’s functions to achieve this, but if you’re doing calculations in your head, on paper or in a calculator you’ll make mistakes which can be difficult – if not impossible – to spot later.
Tip 11: Zero is a Real Number
Don’t enter the number Zero into a cell unless what has been measured, counted or calculated results in the answer Zero.
I’ve often received datasets with lots of zeros and when I asked, the zeros meant ‘I don’t have data for this’.
The problem is that if you want to calculate something, like the mean, then all the zeros will be used in the calculation and you will get an inaccurate answer – or one that is just plain wrong!
Data Collection – Summary
Data collection is pretty simple, but it’s also easy to make mistakes that can cost you huge amounts of time and money when it comes to the data cleaning and pre-processing stages.
Making sure you have a data collection plan, a data collection process and a few simple data collection techniques makes sure that you minimise problems. It really can make the difference between a smooth two hour data cleaning stage, and one that takes you weeks and leaves you feeling like you want to pull your eyes out.
When you know the reason why you’re collecting data and the impact that you envisage making on the world, that helps you to really focus on your main research questions. Once you have these, you’ll have a pretty good idea what data you need to collect.
From here you can build a solid data collection plan to map out the path from question to answer.
Then you can start collecting data, and by following the 11 tips outlined in this post you’ll make far fewer mistakes and get to the sexier parts of your analysis sooner.
After all, that’s where we all want to end up, right?
Your Next Step
Let's face it - you didn't visit this blog post because you're passionate about learning the latest trending data collection techniques, did you?
You didn't leave college or University saying "now that I'm free to forge my own path, I'm going to be a ... a data collectionista!"
That sounds like a line straight out of a Monty Python sketch!
No, you're here because you need to collect data, or you've collected data and want to know what the next steps are. Well, we're here to help.
I hope you've found this blog post useful, and to help you take your career to the next level we've created a series of video courses dedicated to teaching you the best data collection techniques to get your data analysis-ready in double quick time.
You can get the first of these courses right here, which includes all of the essential data collection techniques you're going to need:
This course is completely FREE to get started, with no obligation to buy or even register. Just start learning and decide later.
There really is no reason not to give it a try!