May 18

How to Convert Text Data to Numeric (Excel VLOOKUP)

Blog, Discover Data

0  comments

In this blog post I'm going to show you a simple 3 step process to convert text categories to numerical categories using Excel VLOOKUP and Remove Duplicates so you end up with categorical data that is analysis-ready in any statistics program - with absolutely no manual data manipulation whatsoever!

More...

This post forms part of a series on cleaning and handling text and numeric data in Excel.

For more detail, choose from the options below:

Remove Duplicates & VLOOKUP

Many of Microsoft's huge library of Excel functions and formulae can often be used in different ways to which Microsoft intended.

One of my favourite ways to use Remove Duplicates and VLOOKUP is to convert data from one format to another - an essential task for anyone that needs to move data backwards and forwards between various analysis platforms.

This blog post is all about using the function Remove Duplicates and the formula VLOOKUP, and how they can be used together to convert data from text categories (such as [Small, Medium, Large]) into an analysis-ready integer format (i.e. [1, 2, 3]).

Pin it for later

How to clean text data without data manipulation

Need to save this for later?


Pin it to your favourite board  and you can get back to it when you're ready.

3 Steps to Converting Text Categories to Integer Categories

Before we get started with the details of this incredibly powerful (but quite simple) process, here are the 3 simple steps to converting text categories to integer categories:

  1. 1
    Use Remove Duplicates to Create a List
  2. 2
    Create a Translation Matrix
  3. 3
    Use VLOOKUP to Convert Your Data

So now, let's take a look at a sample of data to see the kinds of data you might typically need to convert.

Practical Data Cleaning

Practical Data Cleaning

Practical Data Cleaning explains the 19 most important tips about data cleaning to get your data analysis-ready in double quick time.

Discover how to clean your data quickly and effectively. Get this book, TODAY!

Your Data

You have a categorical variable called ‘Size’ and possible category names of [Small, Medium, Large]. These data had spelling mistakes in them, and we cleaned these spelling mistakes in a previous step (column C):

How to Convert Text Data to Integers Using Excel VLOOKUP - Your Data

Now your data are clean, but they are not analysis-ready – your stats program won’t accept text data so you’ll need to translate these data into integers.

You need a way to do this automatically, and for this you’re going to use Remove Duplicates and VLOOKUP.

3 Simple Steps to Convert Text Categories to Numerical Categories - in Seconds! #excel #statistics #dataanalysis @chi2innovations

Click to Tweet

How to Find 'Remove Duplicates' in Excel

The Remove Duplicates function in Excel is typically used to eliminate redundant data that you don't need.

In this case, however, you're going to use Remove Duplicates to get a list of all the different categories in your data.

First, though, you'll need to know where to find it in Excel.

  • Remove Duplicates can be found under the Data Tools section of the Data tab

Remove Duplicates can be found in the Data Tools section of the Data tab on the ribbon, and in older versions of Excel it looks like this:

Remove Duplicates - Older Versions of Excel

The Remove Duplicates feature in newer versions of Excel will look like this:

Remove Duplicates - Newer Versions of Excel

Step 1:  Use 'Remove Duplicates' to Create a List

Your first task is to find out what categories there are in your Size variable. You don't know how many rows there are in your dataset and you definitely don’t want scan your dataset by eye, so you need to use Remove Duplicates to create a list of all the unique elements.

The first step is to make a copy of the cleaned 'Size' column. I’ve put it in column I, like this:

Make a Copy of Your Text Data

From here, you select the whole of column I containing the copy and click Remove Duplicates under the Data Tools section of the Data tab.

A window will pop up that looks like this:

Using Remove Duplicates to Create a Unique List of all Text Categories

Make sure that 'Cleaned' and ‘My data has headers’ are both ticked, then click OK. Column I will now list all the unique elements of the Size column: 

Using Remove Duplicates to Discover How Many Text Categories Your Data Has

You now have a unique list of the names of all the text categories in your data.

Step 2: Create Your Translation Matrix

OK, so now you have a list of all the unique elements, you’re going to translate the data from what you have into what you need. The list of what you have is in column I, so you need to put what you need into column J, like this:

How VLOOKUP Can Help Convert Your Text Data to Integers

Columns I and J are now a ‘translation matrix’, as in ‘we want to translate all of these text categories (column I) to be like these numerical categories (column J)’.

How to Use Excel VLOOKUP

In case you haven’t used VLOOKUP before, you build it like this:

=VLOOKUP(Lookup_value, Table_array, Col_index_num, Range_lookup)

  • Lookup_value is the value you want to look up
  • Table_array is the range where you want to lookup the value
  • Col_index_num is the column number in the range containing the return value
  • Range_lookup is a logical value that specifies whether you want VLOOKUP to find an approximate match (a value in a range) or an exact match (a specific value). Enter TRUE for an approximate match or FALSE for an exact match.

It might look a bit complicated, but it's actually easier than you think - it'll all become clear when you see an example.

Hot Tip: Range_lookup

The Range_lookup parameter is a bit of a strange beast, but it is usually correct when you use it like this:

Use TRUE for numerical data

Use FALSE for text data

Step 3: Use VLOOKUP to Convert Your Text Categories to Integers

The next step is to put VLOOKUP to work.

For your data, in column D (titled ‘Coded’) you want to have the data that has been converted from text to integers, so in cell D2 you type the following:

=VLOOKUP(C2,I$2:J$4,2,FALSE)

  • This means ‘look up the value in C2 from the table in I2:J4 and return the matching value from the 2nd column (column J) in cell D2’

So Excel will go off and look up ‘Small’ (the value in C2) in the table, and when it finds it (in row 2) it will return the corresponding row from the 2nd column (column J), which is the number 1.

The next step is to fill in this formula to the bottom of the data. You can do this by double-clicking on the little square in the bottom right-hand corner of the selected cell (D2).

This will apply the formula to the whole column and your data will now look like this:

How to Convert Text Data to Integers Using Excel VLOOKUP - Data Converted

All your text categories have now been converted to numerical, integer categories, and there was no manual data manipulation!

How To Convert Text Categories to Integer Categories - Summary

I hope you learnt something new in this blog post and that you start to use it in your own datasets.

You’ve learnt here that Microsoft Excel, at times, can be more than the sum of its parts – sometimes useful functions and formulae can be much more powerful when you find alternative ways of using them.

I hope you’ve also learnt that data cleaning and preparation don’t have to involve a lot of work – by the careful use of Excel’s built-in tools you can make the process much easier and very much quicker - like correcting spelling mistakes in text data and cleaning numerical data!

Dirty Data Dojo - Data Cleaning

UNIQUE VIDEO COURSE

In less than 2 hours

your data can be:

  • Clean
  • Fit-For-Purpose
  • Analysis Ready

This post forms part of a series on cleaning and handling text and numeric data in Excel.

For more detail, choose from the options below:

How to use Exce's functions differently to clean your text data. #datacleaning #datatips #exceltips

How to clean text data without manual data manipulation. #datips #exceltips #datacleaning


Tags

data cleaning, excel tips, text data


You may also like

Summary Statistics in Excel: The 4 Essential Steps for Fit-For-Purpose, Analysis-Ready Data

Summary Statistics in Excel: The 4 Essential Steps for Fit-For-Purpose, Analysis-Ready Data
{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Data Cleaning - The Big Picture

FREE Ultra HD pdf

Download your FREE mind map to learn the secrets to effortless data cleaning.

Remember Me
Chi-Squared Innovations
Success message!
Warning message!
Error message!