The Hive - Learn, Help, Connect
Dirty Data Dojo - Grand Project




Data cleaning is about being organised and having a plan of action to deal with every type of data error.

Having data doesn't mean having analysis-ready data - you'll still have to clean, prepare and validate your data before you can analyse it.

In the first 5 courses of the Dirty Data Dojo series you learnt how to do all of these things, and now it's time to put it all together and clean a large dataset, taking it from dirty to analysis-ready - and you'll do it all in a couple of hours!

The steps you’ll learn in this course are very simple to follow, but are extremely effective, so you’ll know that you’re getting to the true story of the data, saving you weeks of misery!

Video lessons


Downloadable resources

Certificate of completion

Interactive experience

Perfect for beginners

  • Description
  • Content
  • Outcomes
  • CertificatE
  • 14D2C2

In this project you will:

  • Remove all unwanted spaces and non-printing characters, and case-control all your data in ONE awesome ninja move!
  • Convert all your data into the required formats for analysis
  • Identify the correct data types of each variable in your dataset
  • Remove unwanted entries from your numerical and categorical data
  • Use descriptive statistics to check that your data fit real-life rules
  • Automatically identify and remove outliers

Your Curriculum


Dirty Data Dojo Recap

A recap of everything that you learnt in courses 1-5 of the Dirty Data Dojo series

Open Access



Dirty Data Dojo Recap


Grand Project

In this chapter you'll learn how to remove unwanted text entries that contaminate your data



Data Cleaning:

Removing unwanted spaces and non-printing characters, and case-controlling

Data Preparation:

Converting your data into the required formats

Data Validation:

Checking that your data are sensible and fit real-life rules.

Includes automatically removing outliers


Bypassing everything you've just learnt to get your data clean and analysis-ready in seconds!

Get Started

Ready to get started on the Dirty Data Dojo Grand Project?

Just click the link on the right to go to the first lesson...

How to analyse categorical survey data in Excel and in R
Chi-Squared Innovations
Would love your thoughts, please leave a comment!x