Chapter 02 - Grand Project

Lesson 02 - Grand Project:

Part 2 - Data Validation

Stopwatch - Start Timing

Don't forget to time all your operations!

Practicing in Excel (and Python/R - optional)

Hot Tip

Data preparation isn't an academic exercise - you have to do it and practice it to gain experience.


Don't skip these exercises!


This is your chance to really take your data prep skills to the next level!

Exercise 1: Calculate BMI

In your data, use Weight and Height to calculate Body Mass Index (BMI) according to the equation:

BMI = Weight/(Height x Height)

The Weight and Height in the worksheet are already in the correct units.

If you need a refresher, you'll find the relevant lesson here:

Exercise 2: Convert the Age Data

Now convert all the Age data into Decile Integers, so that patients in their 20s are coded as a 2, those in their 30s as a 3, 40s as 4, and so on.

Make a note in your Codes worksheet so you know what your codes mean.

If you need a refresher, you'll find the relevant lesson here:

Exercise 3: Convert the Tumour Grade Data

Now use your VLOOKUP worksheet to convert the Tumour Grade data into Integers, Tumour Grade 1 becomes a 1, Tumour Grade 2 is a 2 and Tumour Grade 3 is a 3.

Make a note in your Codes worksheet so you know what your codes mean.

If you need a refresher, you'll find the relevant lesson here:

Exercise 4: Convert Your Other Text Data

Now use your VLOOKUP worksheet to convert all the other text variables into integers:

Surgery (1=Yes, 0=No)

Chemotherapy (1=Yes, 0=No)

Tumour Size (1=Small, 2=Medium, 3=Large)

Tumour Side (1=Right, 2=Left)

ER (1=Positive, 0=Negative)

PGR (1=Positive, 0=Negative)

HER2 (1=Positive, 0=Negative)

Survival (1=Alive, 0=Dead)

Make a note in your Codes worksheet so you know what your codes mean.

If you need a refresher, you'll find the relevant lesson here:

Exercise 5: Calculate Triple Negative Status

Now that you have converted ER, PGR and HER2 into integers, you're going to calculate the Triple Negative Status.

Do you remember how to calculate Triple Negative Status? (Hint: we used a Helper column).

Well, use ER, PGR and HER2 to calculate the Triple Negative Status for each patient.

Make a note in your Codes worksheet so you know what your codes mean.

If you need a refresher, you'll find the relevant lesson here:

Exercise 6: Data Types

Now that you have converted all data into the correct format for analysis, assess each variable and decide to which data type they belong. (Hint: Ratio, Interval, Ordinal, Nominal).

Make a note in your Codes worksheet so you know what data type each of your variables are.

If you need a refresher, you'll find the relevant lesson here:

Stopwatch - Stop Timing

So how did you do?

How long did it take you to go through each exercise and overall?

Make a note, and add them to the next set of exercises coming up - you'll end up with a total time for all your data operations!


Submit your timings to the relevant forum page:

Remember Me
Success message!
Warning message!
Error message!