The Hive - Learn, Help, Connect

Data Cleaning Bootcamp - in Excel, Python and R

Chapter 2 - Removing Unwanted Spaces

Lesson 18 - Practice Session - Excel, Python & R

Practicing in Excel

Here's where you get chance to put into practice what you've just learnt.

Hot Tip

Data cleaning isn't an academic exercise - you have to do it and practice it to gain experience.


Don't skip these exercises!


This is your chance to really take your data cleaning skills to the next level!

Before you get started, download the following Excel files to your computer:

Exercise 1: Practical Data Cleaning Excel Template

Go back to the Practical Data Cleaning Excel Template and create your own automated worksheet to remove unwanted spaces, case control, etc., as per the last few lessons.

Run through the process of cleaning your data using this tool several times - repeat, repeat, repeat!

Remember when I said it used to take me 2 weeks to clean a single dataset? Well, most of that time was spent cleaning spaces and case controlling. This step alone will save you DAYS - if not WEEKS - of work, so it is absolutely critical to master it.

Repeat it until you puke. Then repeat it again!!!

Exercise 2: Remove Unwanted Spaces & Case Control - BIG Style!

All the Excel code you need is contained within the Excel Practice File file.

Get familiar with the equation for removing unwanted spaces and case controlling your text entries – it's useful to understand how it works.

Practice removing unwanted spaces on the data in this file using the steps outlined in the video lecture, in particular, note the advantages and disadvantages of using it.

Under what circumstances would you use it, and when would you use some other technique?

This is a critically important technique – practice it and master it!

Coding in Python and/or R

In the last lesson you learnt how to case control your text data.

Investigate what alternatives exist for case control in your choice of Python and/or R.

Exercise 3: Case Standardisation

What similar functions exist as an alternative to case standardisation in Excel?

Python hint:

lower()

upper()

capitalize()

title()

R hint:

str_to_lower()

str_to_upper()

str_to_sentence()

str_to_upper()

Do these functions do what you expect them to?

Is there some other code you need to write to achieve what you want?

It is important to understand what your chosen function can and cannot do, so you can make any necessary adjustments.

Create an object in Python and/or R that replicates Excel's case control functions and submit it to the relevant forum page:

Exercise 4: Single Solution

Integrate your solution for case controlling your data into your solution for your alternative TRIM-CLEAN-SUBSTITUTE function so that you have an all-in-one solution.

Submit it to the relevant forum page:

Remember Me
Success message!
Warning message!
Error message!