If you’re collecting data to try to answer some questions then you need to understand the fundamentals of data collection and the benefits of data collection methods and techniques.
And it doesn’t matter whether you’re a scientist or an entrepreneur, in academia or in business, sooner or later, the issue of data collection (aka data gathering) and having a joined-up data collection plan will come up.
So if you're wondering why data collection is important, what is data collection, how you get started gathering data and what data to collect, you'll find out more about how to collect data the right way in this blog post, including the 5 best data collection methods and 11 data collection techniques.
The Need for Data Collection Methods & Skills
Did you know that 20% of a data analyst’s time is spent on gathering data, but 60% is spent on cleaning and pre-processing data?
Worse still is that most of the data cleaning problems are caused by poor data gathering methods.
Here’s where good data collection methods come to your rescue.
Understanding why you’re collecting data and the impact that will have on the world will help you build a data collection plan. In turn that will help you to understand the different methods of data collection in research and how to collect data.
This blog post deals with the best data collection methods, techniques and tips, and good data management practices to protect the integrity of your data. It will give you some great tips on how to get organised and avoid making the kind of data gathering mistakes that will make you prematurely grey!
The textbooks tend not to dwell on the practical issues of gathering data because, well, to be honest, it can get quite messy. Nonetheless, these are vitally important steps you need to take when you're data gathering to get the most out of it, and if you want to collect data properly you really do need a good set of data collection methods in your armoury.
This blog post has two halves.
In the first half we’ll deal with the 'what' and the 'why' of data collection, such as ‘what is data collection’ and ‘why is data collection important’, and answer some of the most frequently asked questions about data collection methods in research.
In the second half we’ll move on to how to collect data and give you some of the best and most important data collection tips and techniques that you’ll need.
So let’s rewind to the beginning and see what we can do to get you off to a good start...
Why Is Data Collection Important?
If you’re in business, advertising or marketing and your answer to the question of why is data collection important is ‘to make more money’, then you’re on the wrong track.
It doesn’t matter which sector you’re in or what type of research you’re conducting, the biggest reason why data collection is important is to improve people’s lives.
Data is knowledge, and knowledge is power. By measuring and gathering data – and analysing them correctly – you can make informed decisions about what you want your future to look like, and then build strategies on how to get there.
When you’re collecting data and analysing them, the end goal is to make a real, measurable impact on the world, and to do that you go through an expanded DIKW hierarchy (Data, Information, Knowledge, Wisdom), a process of:
- 1Data: Gathering data
- 2Information: Analysing the data to convert them into information
- 3Knowledge: Interpreting the information to gain knowledge
- 4Insight: Make connections in the information to gain insights
- 5Wisdom: Attain the wisdom to appreciate why the insights are important
- 6Impact: Use the insights to make a bigger impact in the world
Data tells you what you’re doing well, so you can replicate your successes, and tells you what you’re not doing well so you can find solutions to your problems and make adjustments.
That is why data collection is important.
This is as true in business as it is in healthcare, engineering, social science or marketing.
If you’re not collecting data, you’re standing still. And since your competitors already have a data collection plan and are collecting data, standing still means going backwards.
What Is The Difference Between Data And Information?
Data is a collection of measurements or observations that represent the real world. On its own, data is useless. It just sits there and stares back at you, daring you to blink first. What you do with the data is important.
By having a data collection plan, interrogating the data and trying to understand it, you can transform the data into information that you can use to understand the world around you – and use it to predict the future.
Information is your interpretation of the underlying story of the data. If you have gathered data carefully with a view to accurately representing the real world, analysed it appropriately using the right tools and interpreted the results correctly, then the story you tell will likely have a strong basis in truth.
The world is yours and there is no stopping you!
Why Do We Collect Data?
If you’re a nose-to-the-grindstone type of person, you probably think that you are gathering data to:
These are all correct answers, but if you’re a bigger picture type, then you probably consider that you are gathering data to:
Whichever type of person you are, and whatever route you take, the data gathering process starts with a question and ends with the actual data collection part itself:
- What is the outcome you seek, and what does this ‘better world’ look like?
- What data do you need to test to reach that outcome?
- Are you currently collecting data – the correct data?
- If not, how do you plan to collect data that you need?
Once you understand the answer to the question 'what is data collection', then you can build a data collection plan and begin the data collection process.
After this you will still need to build a data collection policy and address issues such as data collection procedures, protocol standardisation, data quality assurance, data integrity, data storage and security, but that's another story.
And then comes data cleaning. Oh my, that’s a whole other ball game!
What Are The Benefits Of Data Collection?
So what are the benefits of data collection in research, and why do you need to collect data accurately?
If you want to improve the lives of your [customers/patients/users], *spoiler alert – you do!*, then you need to know what they want and need. If you haven’t asked them yet, now would be a really good time to start!
Collecting data based on what your end users need allows you to get a deeper understanding of them and of your market, and when you know what they want you can figure out how to give it to them.
Better still, the more you understand about the benefits of data collection for both you and your end users, the more you can personalise your offerings to them. In the marketing world this is called customer segmentation, whilst in healthcare it’s called personalised medicine.
- 1Ask how you can help (collect data)
- 2Personalise the solution (create the perfect product for each user)
- 3Deliver what they need (delight your users)
Ultimately, it’s about meeting the end user’s expectations, and if you can do that you’re much more likely to close the deal. And get a repeat deal. And another, and another…
It’s a win for everybody!
What Is Data Collection In Research?
So what is data collection in research?
Data collection is the process by which you:
- 1Gather Information: Gathering data and measuring information in a systematic fashion about specific variables allow you to:
- 2Test Hypotheses: answer the stated research questions, test hypotheses and:
- 3Evaluate Outcomes: evaluate outcomes to:
- 4Answer Questions: get a complete and accurate picture of an area of interest.
For example, if you have a hypothesis that Weight is affected by Household Income, then you will need to collect data on Weight and Household Income from a small sample of the population.
Of course, that leads to many, many more questions, such as:
In other words, before you start collecting data you need a data collection plan!
3 Best Practices For Creating A Data Collection Plan
A data collection plan helps to ensure that your collected data are:
- 2Sufficiently accurate
OK, what do we mean by useful?
Your collected data must be capable of answering your hypothesis. If you’re doing a study on Weight and Household Income, then all your collected data should be in service of that hypothesis. Collecting data on whether your study participants swim regularly might be useful, but collecting data on their shoe size probably isn’t. Data collection best practices dictate that you should only be gathering data you need to answer your hypothesis.
What about sufficiently accurate?
How should you collect data on your Weight variable? If you collect it in kilograms, to how many decimal places should you measure? If you decide to collect data on Weight in categories, then how many categories should you choose? And where should the boundary be between, for example, Overweight and Obese? Are there standards for such measurements? How many different standards are there? Do you know which ones are most appropriate for your study?
And are your collected data fit-for-purpose?
Just because you’ve collected data, it doesn’t mean they are immediately usable. Have your data been collected according to data collection best practices and consistent standards? Are your data sufficiently complete? Are your data biased? Do your data contain outliers?
All these things can get complicated really quickly, and is why you need to follow data collection best practices and build a data collection plan that is deeply thought through well in advance of collecting data.
How to Collect Data: 5 Data Collection Methods in Research
There are five methods of data collection in research - quantitative and qualitative:
Gathering Data using Surveys and Questionnaires
In surveys and questionnaires you ask carefully planned questions so that the responses are capable of answering your research hypothesis. In this method of data collection, you are gathering data by asking respondents to fill in a data collection questionnaire form. Their responses may be closed-ended (rating scales, check-boxes and multiple choice) or open-ended (free text).
Gathering Data using Interviews
With interviews, the researcher will typically have a list of pre-prepared questions to ask, but may also deviate and ask follow-up questions based on responses in real time. Interviews are more customisable than surveys and questionnaires, but can be much more expensive to conduct. Also, the interviewers need to have a lot of experience with collecting data, otherwise the data collected may end up being useless, with a lot of wasted time and money. Data collection skills are critical to success in the interview method of data collection!
Data Gathering by Direct Observation
Direct observation involves collecting information without asking questions. One of the most obvious examples of an observational study is a time-and-motion study, which typically evaluates the efficiency of operations. In this method of data collection, an observer will take a passive role and take notes on their observations.
Data Gathering using Focus Groups
The focus group method of data collection is a combination of interviewing, surveying and observing, with the difference that it is not one-to-one, but is instead a group discussion. Focus groups often use open-ended questions, such as “How did you feel about…” or “What did you like best about…”, and the responses are more about the shared experience than those of any individual.
Gathering Data from Existing Documents and Records
Existing documents and records are a treasure-trove of information, and you can collect a considerable amount of data without interviewing participants or building focus groups. This can be a very efficient and inexpensive data collection method because you’re using research that has already been completed.
What Are Primary And Secondary Data?
Primary data are real-time data that is collected by researchers directly from main sources in prospective studies, while secondary data relates to the past and are data that was previously collected and made available for researchers to use for their own retrospective research studies.
What is Primary Data Collection in Research?
Primary data are often collected in real time to address a particular research problem or hypothesis, and is collected by researchers directly from main sources in prospective studies.
Primary Data Examples
A primary data example is the national census collected by the government to give a head count of everyone in the country on a given day.
If you are collecting data using a survey, a questionnaire or a personal interview, these are all also primary data examples.
Advantages of Primary Data Collection
The main advantages of primary data collection in research are that the data are specific to the needs of the researcher and are usually more accurate and/or up-to-date than secondary data collection, but they can be very time-consuming and expensive to collect.
What is Secondary Data Collection in Research?
Secondary data are data that already exist, are easily accessible to researchers and means that they can check hypotheses quickly and cheaply.
Secondary Data Examples
A secondary data example is investigating census data from previous years.
If you are collecting data using a survey, a questionnaire or a personal interview that have already been conducted previously, these are all also secondary data examples.
Advantages of Secondary Data Collection
The main advantages of secondary data collection in research are that they yield data that are quicker and cheaper to access than primary data, but they can be less accurate and outdated.
11 Cool Data Collection Techniques in Quantitative Research - Can you name them all? #datacollection #statistics #smalldata @chi2innovations
11 Data Collection Techniques in Research (and Statistics, Business, Marketing, and...)
Now we’re getting to the business end of the post.
So how do you improve your data collection skills?
I've put together an infographic on the most important 11 data collection techniques that might help you a little before I go into detail:
And now you have a choice.
You can either go through the slides or follow the text below - or both! Either way will give you the info you need on the best 11 data collection techniques you can find...
Tip 1: Collect Data on Paper First…
So you’ve got your hypothesis (theory, idea or hunch). Once you’ve decided what data you need to collect, the first thing you should do is design a paper-based data collection form to store all your data (assuming that at least some of your data is going to be recorded by hand).
Keep it simple, print it out, then manually record your data with pen and paper. One form per case/patient/customer/test-tube, etc..
Tip 2: Then Transfer it to an Electronic Medium
We may be living in an electronic world, but ultimately you need a system where you (or anyone else) can follow the data trail from beginning to end and – more crucially – from end to beginning.
From time to time you WILL make a mistake with the data, so it is vitally important that you have a data collection method that will let you spot and rectify the mistake by going back through all the steps until you find the error.
So now you have your data recorded on paper you need to transfer it into an electronic system. More than likely this will be either Microsoft Excel or Access.
In general, Excel is more common and easier to use, and has the added advantage that you can manipulate the data and do some simple analyses right there without having to export your data.
Most data is stored in Excel (in 7 years as a medical statistician I was only once given data in Access – all the other times it was in Excel), so we’ll go with that from here on in…
UNIQUE VIDEO COURSE
Good data starts with good data collection.
Get this wrong and you're in a world of pain when it comes to data cleaning!
Tip 3: Collect Data in a Single Excel Worksheet
Trying to sort your data when it is spread across multiple worksheets can lead to all sorts of problems, so try to avoid it whenever you can – keep your data on a single worksheet.
Excel 2003 has limits of 65,536 rows by 256 columns. That’s large enough for most datasets, but if you need higher limits you can use Excel 2010 or 2013 (1,048,576 rows by 16,384 columns).
Tip 4: Use a Unique ID column
You’ll likely have to sort your data many times and by different columns, so you’re going to need a way of restoring the original order.
Use column A as a unique identifier to insert consecutive numbers starting from 1. It may be simple, but it’s very effective.
When you’ve put your Unique IDs into column A, go back to your original paper sheets and write the Unique ID there as well. Trust me – you’ll thank me for this tip later…
Tip 5: One Column per Variable
Each variable should have… oh, hold on a minute, what’s a variable?
Well, simply put, these are the things that can change or can be changed as part of your study. In short, these are all the pieces of information that you are observing, measuring, counting and collecting, like age, gender, distance, temperature, etc..
You can find more information in my blog on different data types.
Where were we? Ah yes…
Each variable should have its own column, and each variable should correspond to just one piece of information.
If you’re entering the age of a patient, then just enter their age, don’t enter their date of birth in the same column or cell. If you want to record their age and DOB, then use 2 separate columns.
If you’re recording a composite variable made up of 2 or more constituent parts, like Body Mass Index – made up of Height and Weight – then record them in separate columns. You can always combine them into a single variable later.
Data Collection: "mess this up and your dataset is going to the great Data Graveyard In The Sky!" #gooddata #datacollection #smalldata @chi2innovations
Tip 6: Row 1 is the Variable Name
Eventually you’ll need to analyse your data and you may need to export it to a statistical program.
The standard for pretty much all commercial stats programs is for the first row to be reserved for the name of the variable and all other rows for the data. So don’t be tempted to use rows 2, 3 and 4 as well as row 1 for the variable name. It might keep everything looking nice and tidy in Excel, but it will only create more work for you later.
Tip 7: Every Cell Should Have Something In It
What do empty cells tell you?
An empty cell is just a great big question mark and tells you nothing.
Worse still, incomplete datasets give reviewers a reason to whack you about the head with a metaphorical stick (and believe me they will – I’ve been there many times…).
So make sure that something is entered in every cell.
It is quite common to use ‘illegal’ numbers as codes to give you information, so where the entries for a variable can only be positive values (like age or height), we can use codes such as 1, 2, 3, etc..
If negative numbers aren’t useful, then use letters a, b, c, etc..
If you’re not comfortable entering something in cells that strictly shouldn’t be there (after all, you are going to have to clean them up later before you can analyse your data), then use Excel’s Comment feature. I tend to use this sparingly, but that’s just me.
Tip 8: Keep Great Notes
When using codes you’ll need to keep notes to tell you what the codes mean. Keep the codes and notes in a different spreadsheet.
While we’re on the subject, it’s really important to
KEEP GREAT NOTES !!!
You’re likely not the only person that will ever work with this dataset, so get used to writing stuff down.
Explain what the project is all about, the question you’re trying to answer, why you’re collecting this data and how you’re going to get the answers you’re looking for. Explain how you measured things and under what conditions. If more than one person is collecting data, then explain who, what, where, when, why and how.
This will be the document that explains all the important stuff about your dataset, so write it down.
If there’s too much information to comfortably put into an Excel spreadsheet, then a Microsoft Word document will be just fine – and keep it in the same folder as the dataset.
Tip 9: Be Consistent With Data Entry in Excel
There’s nothing worse than getting a dataset that takes a fortnight to clean because data entry has not been consistent.
By that I mean make sure that if the entry for a variable should be ‘Positive’, then make it ‘Positive’ and not some other variation (pos, Pos, pos+, etc.).
It’s hard enough correcting speeling missteakes and typos without also having to correct things that were deliberately entered differently.
One of the biggest data collection challenges you will face is inconsistent entry, so restrict the number of people that can enter data to cut down on this kind of issue, and make it clear what your data collection procedures and data entry standards are.
Tip 10: Don’t Guess
Data should be entered as accurately as possible.
Don’t guess, approximate, round up or down.
Enter the value exactly as registered on paper.
If you need the data to be rounded up or down you can use Excel’s functions to achieve this, but if you’re doing calculations in your head, on paper or in a calculator you’ll make mistakes which can be difficult – if not impossible – to spot later.
Tip 11: Zero is a Real Number
Don’t enter the number Zero into a cell unless what has been measured, counted or calculated results in the answer Zero.
I’ve often received datasets with lots of zeros and when I asked, the zeros meant ‘I don’t have data for this’.
The problem is that if you want to calculate something, like the mean, then all the zeros will be used in the calculation and you will get an inaccurate answer – or one that is just plain wrong!
Data Collection Methods & Techniques – Summary
The benefits of data collection are clear.
And while data collection is pretty simple, it’s also easy to make mistakes that can cost you huge amounts of time and money when it comes to the data cleaning and pre-processing stages.
Making sure you understand why data collection in research is important, that you have a data collection plan and a data collection method you can follow, our tips and data collection techniques make sure that you minimise problems. It really can make the difference between a smooth two hour data cleaning stage, and one that takes you weeks and leaves you feeling like you want to pull your eyes out.
When you know the reason why you’re gathering data and the impact that you envisage making on the world, that helps you to really focus on your main research questions. Once you have these, you’ll have a pretty good idea of 'what is data collection in research', how to collect data and what data you need to collect.
From here you can build a solid data collection plan to map out the path from question to answer.
Then you can start the data gathering process, and by following the 5 data collection methods and 11 tips outlined in this post you’ll make far fewer mistakes and get to the sexier parts of your analysis sooner.
After all, that’s where we all want to end up, right?
Your Next Data Collection Step
Let's face it - you didn't visit this blog post because you're passionate about learning the latest trending data collection methods in research, did you?
You didn't leave college or University saying "now that I'm free to forge my own path, I'm going to be a ... a data collectionista!"
That sounds like a line straight out of a Monty Python sketch!
No, you're here because you need to know how to collect data (or you've already started collecting data but realise you're making mistakes) and want to know what the next steps are. Well, we're here to help.
I hope you've found this blog post useful, and to help you take your career to the next level we've created an exclusive video course dedicated to teaching you how to collect data efficiently and cleanly, and includes the best data collection methods in research, tips and techniques to get your data analysis-ready in double quick time.
You can get this course right here, which includes all of the essential data collection methods you're going to need:
UNIQUE VIDEO COURSE
Good data starts with good data collection.
Get this wrong and you're in a world of pain when it comes to data cleaning!
Our Dirty Data Dojo video course on data collection and data collection methods in research is completely FREE to get started, with no obligation to buy or even register. Just start learning and decide later.
There really is no reason not to give it a try!