Chapter 02 - Grand Project

Lesson 03 - Grand Project:

Part 3 - Data Validation

Stopwatch - Start Timing

Start your stopwatch!

Practicing in Excel (and Python/R - optional)

Hot Tip

Data preparation isn't an academic exercise - you have to do it and practice it to gain experience.


Don't skip these exercises!


This is your chance to really take your data prep skills to the next level!

Exercise 1: Descriptive Statistics

Calculate the Descriptive Statistics for all the variables, as you learnt to do.

Check through the results.

In the Age data, are there any values smaller than zero? Are any equal to zero? Are there any patients that are unusually old?

Identify all these erroneous datapoints, decide what you're going to do about them, then clean them up.

Do the same for all other variables.

If you need a refresher, you'll find the relevant lesson here:

Exercise 2: Identify and Remove Outliers

Now use your Outlier worksheet to check whether any of the numerical variables contain statistical outliers.

Start with the Age data.

Are these outliers close to the accepted limits or far from them?

Identify all these erroneous datapoints, decide what you're going to do about them, then clean them up.

Do the same for all other variables.

If you need a refresher, you'll find the relevant lesson here:

Stopwatch - Stop Timing

So how did you do?

How long did it take you to go through each exercise and overall?

Make a note, and add them to the next set of exercises coming up - you'll end up with a total time for all your data operations!


Submit your timings to the relevant forum page:

Remember Me
Success message!
Warning message!
Error message!