Did you know that there are only 4 types of data in statistics? Do you know what they are and what you can do with them?
Have you ever looked at your data and wondered how and where to get started? If you don't know the difference between quantitative data and qualitative data then you're in the right place. Here is our guide to statistical data types and how to deal with them...
Pin it for later
Need to save this for later?
Pin it to your favourite board and you can get back to it when you're ready.
4 Types of Data in Statistics
Ultimately, there are just 2 classes of data in statistics that can be further sub-divided into 4 statistical data types. You may have heard phrases such as 'ordinal data', 'nominal data', 'discrete data' and so on. But have you ever wondered just what they are and what they've got to do with your research and your data?
Surely they are all just fancy words made up by mathematicians and statisticians to make them sound important, aren't they?
Well actually, they are pretty important, because if you know what types of data you have, then you know which maths and stats operations you're allowed to use on your data.
Get that wrong and you're skating on pretty thin ice - sooner or later you're going to make your boss rather unhappy, and nobody wants that, do they?
So take a deep breath and let's go.
I promise this will all be quite painless...
Types of Data in Statistics: Quantitative Data and Qualitative Data
Measured data are measured with some kind of measuring implement - ruler, jug, weighing scales, stop-watch, thermometer and so on. Observed data are placed into categories - gender (male, female), health (healthy, sick), opinion (agree, neutral, disagree).
So to put it in simple terms:
Tools for measuring quantitative data
Qualitative data are categorised
The following infographic might help you to understand the 4 types of data in statistics and to visualise what we're discussing. We'll refer to it throughout...
No doubt you've noticed that quantitative data and qualitative data can be sub-divided into 4 further classes of statistical data types; Ratio Data, Interval Data, Ordinal Data and Nominal Data.
You can figure the difference by asking 3 questions:
If we can answer these 3 questions for each of our types of data then we can correctly determine its class.
4 Types of Data in Statistics: "you can distinguish between all types of data by asking 3 simple questions" @eelrekab #dataanalytics
Types of Data in Statistics: Nominal Data
Nominal data is the class of data type for data that has the following properties:
We can differentiate between categories based only on their names, hence the title 'nominal' (from the Latin nomen, meaning 'name').
Examples of nominal data include:
The only mathematical or logical operations we can perform on nominal data is to say that an observation is (or is not) the same as another (equality or inequality). We can determine the most common item by finding the mode (do you remember this from High School classes?).
Other ways of finding the middle of the class, such as median or mean make no sense because ranking is meaningless for nominal data.
Practical Data Cleaning
Practical Data Cleaning explains the 19 most important tips about data cleaning to get your data analysis-ready in double quick time.
Discover how to clean your data quickly and effectively. Get this book, TODAY!
Types of Data in Statistics: Ordinal Data
Ordinal data have the following properties:
Their categories can be ordered (1st, 2nd, 3rd, etc. - hence the name 'ordinal'), but there is no consistency in the relative distances between adjacent categories.
Examples of ordinal data include:
Mathematically, we can make simple comparisons between the categories, such as more or less healthy/severe, agree more or less, etc.. Since there is an order to the data we can rank them and compute the median (or mode, but not the mean) to find the central value.
It is interesting to note that in practice some ordinal data are treated as interval data. Tumour Grade is a classic example in healthcare, because the statistical tests that can be used on interval data (they meet the requirement of equal intervals) are much more powerful than those used on ordinal data. This is OK as long as your data collection methods ensure that the equidistant rule isn't bent too much.
Types of Data in Statistics: Interval Data
Interval data have the following properties:
Interval data are ordered, can be continuous (have an infinite number of steps) or discrete (organised into categories), and the degree of difference between items is meaningful (their intervals are equal), but not their ratio.
Examples of interval data include:
Although interval data can appear very similar to ratio data, the difference is in their defined zero-points. If the zero-point of the scale has been chosen arbitrarily (such as the melting point of water or from an arbitrary epoch such as AD) then the data cannot be on the ratio scale and must be interval.
Mathematically we may compare the degrees of the data (equality/inequality, more/less) and we may add/subtract the values, such as '20°C is 10 degrees hotter than 10°C' or '6pm is 3 hours later than 3pm'. However, we cannot multiply or divide the numbers because of the arbitrary zero, so we can't say '20°C is twice as hot as 10°C' or '6pm is twice as late as 3pm'.
The central value of interval data is typically the mean (but could be the median or mode). We can also express the spread or variability of the data using measures such as the range, standard deviation, variance and/or confidence intervals.
Types of Data in Statistics: Ratio Data
Ratio data have the following properties:
As with interval data, ratio data can be continuous or discrete, and differs from interval data in that there is a non-arbitrary zero-point to the data. Examples include:
For each of these examples there is a real, meaningful zero-point. The age of a person, absolute zero, distance measured from a pre-determined point or time all have real zeros. With real zero-points we can multiply and divide the numbers - a 6 year old is half the age of a 12 year old and matter at 100K has half the energy of matter at 200K. Similarly, the distance from Barcelona to Berlin is half the distance as Barcelona to Moscow, and it takes me twice as long to run the 100m as Usain Bolt but only half the time of my Grandad.
Ratio data are the best to deal with mathematically (note that I didn't say easiest...) because all possibilities are on the table. We can find the central point of the data by using any of the mode, median or mean (arithmetic, geometric or harmonic) and use all of the most powerful statistical methods to analyse the data. As long as we choose correctly we can be really confident that that we are not being misled by the data and our interpretations are likely to have merit.