Why Coding Pipes in R Blew My Mind

Why learn Pipes in R for Data Science

Over a decade ago I tried (unsuccessfully) to learn how to program in R, but ​with help from a friend - Matt Dancho - I'm finally getting started.

If you didn't know, I'm writing my findings and experiences into a series of blog posts so you can follow along with my progress and maybe try out R ​for yourself.

This is the second post in a series of, oh, I've no idea how many, because I'm writing these as I go along.

I hope you'll join me on this journey, and if you'd like to start from the beginning, you can find the first post here:

Getting Started With R Programming For Data Science

Disclosure: we may earn an affiliate commission for purchases you make when using the links to courses on this page.

​Catch Up

​​So far I've installed R and R-Studio​, but that didn't work on my PC, so I use R-Studio Cloud instead. It works in exactly the same way, so nothing lost there.

Then I installed the Tidyverse package, a collection of R packages designed for Data Science, ​imported ​three data tables, and learned how to join them together.

Then came Pipes, which blew my mind - and if you haven't heard of them before, I'm sure they're going to blow yours too!

Pipes is what this blog post is all about.

If you're in a hurry and want to learn how to program in R for Data Science in 7 weeks, Matt's course will teach you just how to do that.

And Matt's been kind enough to offer our readers a 15% discount.

What is the Pipes Function in R

Pipes is a function that is probably unique to R, and is a way of creating a sequence of multiple operations, and you can put all sorts of things within pipes.

The function Pipes is included in the 'magrittr' package, which itself is included in tidyverse, so there's nothing new for me to import or install to use it.

Let's say that we want to find the average price of red T-shirts in each size category​. Here's what the data might look like:

T-Shirt Dataset

​And here are the steps that we follow to analyse these data:

  1. 1
    ​Filter the T-shirt dataset to only keep certain observations (e.g. Red only)
  2. 2
    ​Group the filtered data into categories (e.g. by the variable 'Size')
  3. 3
    ​Summarise the grouped and filtered data by calculating the average price

​Creating Intermediate Steps

One way of coding this is to save each step as a new variable, like this:


myData_1 <- filter(dataset, Color == "Red")
myData_2 <- group_by(myData_1, Size)
myData_3 <- summarize(myData_2, price = mean(Price))

The problem here is that we have to name each intermediate object. Should I just add a number to the end or come up with some imaginative way of naming them?

​Another problem here is what happens when we reference the wrong intermediate, like this:


myData_1 <- filter(dataset, ​Color == "Red")
myData_2 <- group_by(myData_1, Size)
myData_3 <- summarize(myData_1, price = mean(Price))

​This would invariably give us the wrong answer, which we might not spot because the code still works and ​doesn't give us an error message.

Why Coding Pipes in R Blew My Mind #datascience #rstats #tidyverse @eelrekab @chi2innovations

Click to Tweet

​Overwriting The Original ​Variables

An alternative way of coding this is replace each ​variable with the new variable, like this:


Tshirts <- filter(dataset, ​Color == "Red")
​​Tshirts <- group_by(​​Tshirts, Size)
​​Tshirts <- summarize(​​Tshirts, price = mean(Price))

​Here, we have the problem of sorting out what happens if we make an error in the middle of the operation. There is no way to rewind, and we would need to re-run the entire operation from the beginning

​Pipes To The Rescue

To code this with pipes, we use the pipe operator %>% between each operation in the sequence, like this:


Tshirts %>%​
     ​filter(​Color == "Red") %>%
​    group_by(​Size) %>%
​    summarize(​price = mean(Price))

Here, pipes simply passes the output from the first line to the next line as the input, and so on to the end of the sequence. Adding new operations to the sequence is as simple as adding them into the pipe at the appropriate place. Wanna plot the ​average prices of the different sized red T-shirts as a Histogram? Add one line of code at the end of the pipe!

Pipes are so intuitive that, not only do they make sense to a computer, they make sense to a human too, like this:


​I got up %>% shaved %>% showered %>% showed up to work %>% goofed off all day

​Now you see why Pipes blows my mind!

Check Out The Course

​If you're interested in learning R programming for business and follow along with me in this blog series, coding as you go, I highly recommend that you check out Matt's course. It's called Business Analysis With R, and you can check it out below.

Business Analysis in R

with Matt Dancho

​All posts in the series:

Why learning to code R Pipes blew my mind
Lee Baker
administrator
Lee Baker is an award-winning software creator that lives behind a keyboard in a darkened room. Illuminated only by the light from his monitor, he aspires to finding the light switch. With decades of experience in science, statistics and artificial intelligence, he has a passion for telling stories with data, yet despite explaining it a dozen times, his mother still doesn’t understand what he does for a living. Insisting that data analysis is much simpler than we think it is, he creates friendly, easy-to-understand video courses that teach the fundamentals of data analysis and statistics. As the CEO of Chi-Squared Innovations, one day he’d like to retire to do something simpler, like crocodile wrestling.
Do NOT follow this link or you will be banned from the site!