May 18

# How to Convert Text Data to Numeric (Excel VLOOKUP)

Blog, Discover Data

In this blog post I'm going to show you a simple 3 step process to convert text categories to numerical categories using Excel VLOOKUP and Remove Duplicates so you end up with categorical data that is analysis-ready in any statistics program - with absolutely no manual data manipulation whatsoever!

More...

This post forms part of a series on cleaning and handling text and numeric data in Excel.

For more detail, choose from the options below:

## Remove Duplicates & VLOOKUP

Many of Microsoft's huge library of Excel functions and formulae can often be used in different ways to which Microsoft intended.

One of my favourite ways to use Remove Duplicates and VLOOKUP is to convert data from one format to another - an essential task for anyone that needs to move data backwards and forwards between various analysis platforms.

This blog post is all about using the function Remove Duplicates and the formula VLOOKUP, and how they can be used together to convert data from text categories (such as [Small, Medium, Large]) into an analysis-ready integer format (i.e. [1, 2, 3]).

#### Pin it for later

Need to save this for later?

Pin it to your favourite board  and you can get back to it when you're ready.

## 3 Steps to Converting Text Categories to Integer Categories

Before we get started with the details of this incredibly powerful (but quite simple) process, here are the 3 simple steps to converting text categories to integer categories:

1. 1
Use Remove Duplicates to Create a List
2. 2
Create a Translation Matrix
3. 3
Use VLOOKUP to Convert Your Data

So now, let's take a look at a sample of data to see the kinds of data you might typically need to convert.

#### Practical Data Cleaning

Practical Data Cleaning explains the 19 most important tips about data cleaning to get your data analysis-ready in double quick time.

Discover how to clean your data quickly and effectively. Get this book, TODAY!

You have a categorical variable called ‘Size’ and possible category names of [Small, Medium, Large]. These data had spelling mistakes in them, and we cleaned these spelling mistakes in a previous step (column C):

Now your data are clean, but they are not analysis-ready – your stats program won’t accept text data so you’ll need to translate these data into integers.

You need a way to do this automatically, and for this you’re going to use Remove Duplicates and VLOOKUP.

3 Simple Steps to Convert Text Categories to Numerical Categories - in Seconds! #excel #statistics #dataanalysis @chi2innovations

## How to Find 'Remove Duplicates' in Excel

The Remove Duplicates function in Excel is typically used to eliminate redundant data that you don't need.

In this case, however, you're going to use Remove Duplicates to get a list of all the different categories in your data.

First, though, you'll need to know where to find it in Excel.

• Remove Duplicates can be found under the Data Tools section of the Data tab

Remove Duplicates can be found in the Data Tools section of the Data tab on the ribbon, and in older versions of Excel it looks like this:

The Remove Duplicates feature in newer versions of Excel will look like this:

## Step 1:  Use 'Remove Duplicates' to Create a List

Your first task is to find out what categories there are in your Size variable. You don't know how many rows there are in your dataset and you definitely don’t want scan your dataset by eye, so you need to use Remove Duplicates to create a list of all the unique elements.

The first step is to make a copy of the cleaned 'Size' column. I’ve put it in column I, like this:

From here, you select the whole of column I containing the copy and click Remove Duplicates under the Data Tools section of the Data tab.

A window will pop up that looks like this:

Make sure that 'Cleaned' and ‘My data has headers’ are both ticked, then click OK. Column I will now list all the unique elements of the Size column:

You now have a unique list of the names of all the text categories in your data.

## Step 2: Create Your Translation Matrix

OK, so now you have a list of all the unique elements, you’re going to translate the data from what you have into what you need. The list of what you have is in column I, so you need to put what you need into column J, like this:

Columns I and J are now a ‘translation matrix’, as in ‘we want to translate all of these text categories (column I) to be like these numerical categories (column J)’.

## How to Use Excel VLOOKUP

In case you haven’t used VLOOKUP before, you build it like this:

=VLOOKUP(Lookup_value, Table_array, Col_index_num, Range_lookup)

• Lookup_value is the value you want to look up
• Table_array is the range where you want to lookup the value
• Col_index_num is the column number in the range containing the return value
• Range_lookup is a logical value that specifies whether you want VLOOKUP to find an approximate match (a value in a range) or an exact match (a specific value). Enter TRUE for an approximate match or FALSE for an exact match.

It might look a bit complicated, but it's actually easier than you think - it'll all become clear when you see an example.

Hot Tip: Range_lookup

The Range_lookup parameter is a bit of a strange beast, but it is usually correct when you use it like this:

Use TRUE for numerical data

Use FALSE for text data

## Step 3: Use VLOOKUP to Convert Your Text Categories to Integers

The next step is to put VLOOKUP to work.

For your data, in column D (titled ‘Coded’) you want to have the data that has been converted from text to integers, so in cell D2 you type the following:

=VLOOKUP(C2,I\$2:J\$4,2,FALSE)

• This means ‘look up the value in C2 from the table in I2:J4 and return the matching value from the 2nd column (column J) in cell D2’

So Excel will go off and look up ‘Small’ (the value in C2) in the table, and when it finds it (in row 2) it will return the corresponding row from the 2nd column (column J), which is the number 1.

The next step is to fill in this formula to the bottom of the data. You can do this by double-clicking on the little square in the bottom right-hand corner of the selected cell (D2).

This will apply the formula to the whole column and your data will now look like this:

All your text categories have now been converted to numerical, integer categories, and there was no manual data manipulation!

## How To Convert Text Categories to Integer Categories - Summary

I hope you learnt something new in this blog post and that you start to use it in your own datasets.

You’ve learnt here that Microsoft Excel, at times, can be more than the sum of its parts – sometimes useful functions and formulae can be much more powerful when you find alternative ways of using them.

I hope you’ve also learnt that data cleaning and preparation don’t have to involve a lot of work – by the careful use of Excel’s built-in tools you can make the process much easier and very much quicker - like correcting spelling mistakes in text data and cleaning numerical data!

### UNIQUE VIDEO COURSE

In less than 2 hours

• Clean
• Fit-For-Purpose

This post forms part of a series on cleaning and handling text and numeric data in Excel.

For more detail, choose from the options below:

Tags

data cleaning, excel tips, text data

## 45+ Awesome Gifts for Data Scientists, Statisticians and Other Geeks

45+ Awesome Gifts for Data Scientists, Statisticians and Other Geeks

## Computational Statistics is the New Holy Grail – Experts

Computational Statistics is the New Holy Grail – Experts

## 3 Crucial Tips for Data Processing and Analysis

3 Crucial Tips for Data Processing and Analysis

## Correlation Is Not Causation – Pirates Prove It!

Correlation Is Not Causation – Pirates Prove It!

## Summary Statistics in Excel: The 4 Essential Steps for Fit-For-Purpose, Analysis-Ready Data

Summary Statistics in Excel: The 4 Essential Steps for Fit-For-Purpose, Analysis-Ready Data

## How to Clean Data in Excel – The 5 Data Cleaning Steps You Need to Know

How to Clean Data in Excel – The 5 Data Cleaning Steps You Need to Know