AnalyticsDojo

Introduction to R - Tidyverse

rpi.analyticsdojo.com

Overview

It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data. (Dasu and Johnson, 2003)

Thus before you can even get to doing any sort of sophisticated analysis or plotting, you’ll generally first need to:

  1. Manipulating data frames, e.g. filtering, summarizing, and conducting calculations across groups.
  2. Tidying data into the appropriate format

What is the Tidyverse?

Tidyverse

  • “The tidyverse is a set of packages that work in harmony because they share common data representations and API design.” -Hadley Wickham
  • The variety of packages include dplyr, tibble, tidyr, readr, purrr (and more).

Schools of Thought

There are two competing schools of thought within the R community.

  • We should stick to the base R functions to do manipulating and tidying; tidyverse uses syntax that’s unlike base R and is superfluous.
  • We should start teaching students to manipulate data using tidyverse tools because they are straightfoward to use, more readable than base R, and speed up the tidying process.

We’ll show you some of the tidyverse tools so you can make an informed decision about whether you want to use base R or these newfangled packages.

Dataframe Manipulation using Base R Functions

  • So far, you’ve seen the basics of manipulating data frames, e.g. subsetting, merging, and basic calculations.
  • For instance, we can use base R functions to calculate summary statistics across groups of observations,
  • e.g. the mean GDP per capita within each region:
gapminder <- read.csv("../../input/gapminder-FiveYearData.csv",
          stringsAsFactors = TRUE)
head(gapminder)

<th scope=col>country</th><th scope=col>year</th><th scope=col>pop</th><th scope=col>continent</th><th scope=col>lifeExp</th><th scope=col>gdpPercap</th>
Afghanistan1952 8425333 Asia 28.801 779.4453
Afghanistan1957 9240934 Asia 30.332 820.8530
Afghanistan1962 10267083 Asia 31.997 853.1007
Afghanistan1967 11537966 Asia 34.020 836.1971
Afghanistan1972 13079460 Asia 36.088 739.9811
Afghanistan1977 14880372 Asia 38.438 786.1134

But this isn’t ideal because it involves a fair bit of repetition. Repeating yourself will cost you time, both now and later, and potentially introduce some nasty bugs.

Dataframe Manipulation using dplyr

Here we’re going to cover 6 of the most commonly used functions as well as using pipes (%>%) to combine them.

  1. select()
  2. filter()
  3. group_by()
  4. summarize()
  5. mutate()
  6. arrange()

If you have have not installed this package earlier, please do so now:

install.packages('dplyr')

Dataframe Manipulation using dplyr

Luckily, the dplyr package provides a number of very useful functions for manipulating dataframes. These functions will save you time by reducing repetition. As an added bonus, you might even find the dplyr grammar easier to read.

#Now lets load some packages:
library(dplyr)
library(ggplot2)
library(tidyverse)

dplyr select

Imagine that we just received the gapminder dataset, but are only interested in a few variables in it. We could use the select() function to keep only the columns corresponding to variables we select.

year_country_gdp <-gapminder[,c("year","country")] 
year_country_gdp

<th scope=col>year</th><th scope=col>country</th>
1952 Afghanistan
1957 Afghanistan
1962 Afghanistan
1967 Afghanistan
1972 Afghanistan
1977 Afghanistan
1982 Afghanistan
1987 Afghanistan
1992 Afghanistan
1997 Afghanistan
2002 Afghanistan
2007 Afghanistan
1952 Albania
1957 Albania
1962 Albania
1967 Albania
1972 Albania
1977 Albania
1982 Albania
1987 Albania
1992 Albania
1997 Albania
2002 Albania
2007 Albania
1952 Algeria
1957 Algeria
1962 Algeria
1967 Algeria
1972 Algeria
1977 Algeria
1982 Yemen Rep.
1987 Yemen Rep.
1992 Yemen Rep.
1997 Yemen Rep.
2002 Yemen Rep.
2007 Yemen Rep.
1952 Zambia
1957 Zambia
1962 Zambia
1967 Zambia
1972 Zambia
1977 Zambia
1982 Zambia
1987 Zambia
1992 Zambia
1997 Zambia
2002 Zambia
2007 Zambia
1952 Zimbabwe
1957 Zimbabwe
1962 Zimbabwe
1967 Zimbabwe
1972 Zimbabwe
1977 Zimbabwe
1982 Zimbabwe
1987 Zimbabwe
1992 Zimbabwe
1997 Zimbabwe
2002 Zimbabwe
2007 Zimbabwe
year_country_gdp <- select(gapminder, year, country, gdpPercap)
head(year_country_gdp)

<th scope=col>year</th><th scope=col>country</th><th scope=col>gdpPercap</th>
1952 Afghanistan779.4453
1957 Afghanistan820.8530
1962 Afghanistan853.1007
1967 Afghanistan836.1971
1972 Afghanistan739.9811
1977 Afghanistan786.1134

dplyr Piping

  • %>% Is used to help to write cleaner code.
  • It is loaded by default when running the tidyverse, but it comes from the magrittr package.
  • Input from one command is piped to another without saving directly in memory with an intermediate throwaway variable. -Since the pipe grammar is unlike anything we’ve seen in R before, let’s repeat what we’ve done above using pipes.
year_country_gdp <- gapminder %>% select(year,country,gdpPercap)


dplyr filter

Now let’s say we’re only interested in African countries. We can combine select and filter to select only the observations where continent is Africa.

As with last time, first we pass the gapminder dataframe to the filter() function, then we pass the filtered version of the gapminder dataframe to the select() function.

To clarify, both the select and filter functions subsets the data frame. The difference is that select extracts certain columns, while filter extracts certain rows.

Note: The order of operations is very important in this case. If we used ‘select’ first, filter would not be able to find the variable continent since we would have removed it in the previous step.

year_country_gdp_africa <- gapminder %>%
    filter(continent == "Africa") %>%
    select(year,country,gdpPercap)

dplyr Calculations Across Groups

A common task you’ll encounter when working with data is running calculations on different groups within the data. For instance, what if we wanted to calculate the mean GDP per capita for each continent?

In base R, you would have to run the mean() function for each subset of data:

mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])
mean(gapminder[gapminder$continent == "Americas", "gdpPercap"])
mean(gapminder[gapminder$continent == "Asia", "gdpPercap"])


2193.75457828574
7136.11035559
7902.15042805328

dplyr split-apply-combine

The abstract problem we’re encountering here is know as “split-apply-combine”:

We want to split our data into groups (in this case continents), apply some calculations on each group, then combine the results together afterwards.

Module 4 gave some ways to do split-apply-combine type stuff using the apply family of functions, but those are error prone and messy.

Luckily, dplyr offers a much cleaner, straight-forward solution to this problem.

# remove this column -- there are two easy ways!

dplyr group_by

We’ve already seen how filter() can help us select observations that meet certain criteria (in the above: continent == "Europe"). More helpful, however, is the group_by() function, which will essentially use every unique criteria that we could have used in filter().

A grouped_df can be thought of as a list where each item in the list is a data.frame which contains only the rows that correspond to the a particular value continent (at least in the example above).

#Summarize returns a dataframe. 
gdp_bycontinents <- gapminder %>%
    group_by(continent) %>%
    summarize(mean_gdpPercap = mean(gdpPercap))
head(gdp_bycontinents)

<th scope=col>continent</th><th scope=col>mean_gdpPercap</th>
Africa 2193.755
Americas 7136.110
Asia 7902.150
Europe 14469.476
Oceania 18621.609

That allowed us to calculate the mean gdpPercap for each continent. But it gets even better – the function group_by() allows us to group by multiple variables. Let’s group by year and continent.

gdp_bycontinents_byyear <- gapminder %>%
    group_by(continent, year) %>%
    summarize(mean_gdpPercap = mean(gdpPercap))
gdp_bycontinents_byyear

<th scope=col>continent</th><th scope=col>year</th><th scope=col>mean_gdpPercap</th>
Africa 1952 1252.572
Africa 1957 1385.236
Africa 1962 1598.079
Africa 1967 2050.364
Africa 1972 2339.616
Africa 1977 2585.939
Africa 1982 2481.593
Africa 1987 2282.669
Africa 1992 2281.810
Africa 1997 2378.760
Africa 2002 2599.385
Africa 2007 3089.033
Americas 1952 4079.063
Americas 1957 4616.044
Americas 1962 4901.542
Americas 1967 5668.253
Americas 1972 6491.334
Americas 1977 7352.007
Americas 1982 7506.737
Americas 1987 7793.400
Americas 1992 8044.934
Americas 1997 8889.301
Americas 2002 9287.677
Americas 2007 11003.032
Asia 1952 5195.484
Asia 1957 5787.733
Asia 1962 5729.370
Asia 1967 5971.173
Asia 1972 8187.469
Asia 1977 7791.314
Asia 1982 7434.135
Asia 1987 7608.227
Asia 1992 8639.690
Asia 1997 9834.093
Asia 2002 10174.090
Asia 2007 12473.027
Europe 1952 5661.057
Europe 1957 6963.013
Europe 1962 8365.487
Europe 1967 10143.824
Europe 1972 12479.575
Europe 1977 14283.979
Europe 1982 15617.897
Europe 1987 17214.311
Europe 1992 17061.568
Europe 1997 19076.782
Europe 2002 21711.732
Europe 2007 25054.482
Oceania 1952 10298.086
Oceania 1957 11598.522
Oceania 1962 12696.452
Oceania 1967 14495.022
Oceania 1972 16417.333
Oceania 1977 17283.958
Oceania 1982 18554.710
Oceania 1987 20448.040
Oceania 1992 20894.046
Oceania 1997 24024.175
Oceania 2002 26938.778
Oceania 2007 29810.188

mpg<-mpg
str(mpg)


Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	234 obs. of  11 variables:
 $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
 $ model       : chr  "a4" "a4" "a4" "a4" ...
 $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr  "f" "f" "f" "f" ...
 $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr  "p" "p" "p" "p" ...
 $ class       : chr  "compact" "compact" "compact" "compact" ...

That is already quite powerful, but it gets even better! You’re not limited to defining 1 new variable in summarize().

gdp_pop_bycontinents_byyear <- gapminder %>%
    group_by(continent, year) %>%
    summarize(mean_gdpPercap = mean(gdpPercap),
              sd_gdpPercap = sd(gdpPercap),
              mean_pop = mean(pop),
              sd_pop = sd(pop))
head(gdp_pop_bycontinents_byyear)

<th scope=col>continent</th><th scope=col>year</th><th scope=col>mean_gdpPercap</th><th scope=col>sd_gdpPercap</th><th scope=col>mean_pop</th><th scope=col>sd_pop</th>
Africa 1952 1252.572 982.95214570010 6317450
Africa 1957 1385.236 1134.50895093033 7076042
Africa 1962 1598.079 1461.83925702247 7957545
Africa 1967 2050.364 2847.71766447875 8985505
Africa 1972 2339.616 3286.85397305376 10130833
Africa 1977 2585.939 4142.39878328097 11585184

Basics

  • Use the mpg dataset to create summaries by manufacturer/year for 8 cyl vehicles.
mpg<-mpg
head(mpg)

<th scope=col>manufacturer</th><th scope=col>model</th><th scope=col>displ</th><th scope=col>year</th><th scope=col>cyl</th><th scope=col>trans</th><th scope=col>drv</th><th scope=col>cty</th><th scope=col>hwy</th><th scope=col>fl</th><th scope=col>class</th>
audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
audi a4 1.8 1999 4 manual(m5)f 21 29 p compact
audi a4 2.0 2008 4 manual(m6)f 20 31 p compact
audi a4 2.0 2008 4 auto(av) f 21 30 p compact
audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
audi a4 2.8 1999 6 manual(m5)f 18 26 p compact
#This just gives a dataframe with 70 obs, only 8 cylinder cars 
mpg.8cyl<-mpg %>% 
  filter(cyl == 8)
mpg.8cyl


<th scope=col>manufacturer</th><th scope=col>model</th><th scope=col>displ</th><th scope=col>year</th><th scope=col>cyl</th><th scope=col>trans</th><th scope=col>drv</th><th scope=col>cty</th><th scope=col>hwy</th><th scope=col>fl</th><th scope=col>class</th>
audi a6 quattro 4.2 2008 8 auto(s6) 4 16 23 p midsize
chevrolet c1500 suburban 2wd 5.3 2008 8 auto(l4) r 14 20 r suv
chevrolet c1500 suburban 2wd 5.3 2008 8 auto(l4) r 11 15 e suv
chevrolet c1500 suburban 2wd 5.3 2008 8 auto(l4) r 14 20 r suv
chevrolet c1500 suburban 2wd 5.7 1999 8 auto(l4) r 13 17 r suv
chevrolet c1500 suburban 2wd 6.0 2008 8 auto(l4) r 12 17 r suv
chevrolet corvette 5.7 1999 8 manual(m6) r 16 26 p 2seater
chevrolet corvette 5.7 1999 8 auto(l4) r 15 23 p 2seater
chevrolet corvette 6.2 2008 8 manual(m6) r 16 26 p 2seater
chevrolet corvette 6.2 2008 8 auto(s6) r 15 25 p 2seater
chevrolet corvette 7.0 2008 8 manual(m6) r 15 24 p 2seater
chevrolet k1500 tahoe 4wd 5.3 2008 8 auto(l4) 4 14 19 r suv
chevrolet k1500 tahoe 4wd 5.3 2008 8 auto(l4) 4 11 14 e suv
chevrolet k1500 tahoe 4wd 5.7 1999 8 auto(l4) 4 11 15 r suv
chevrolet k1500 tahoe 4wd 6.5 1999 8 auto(l4) 4 14 17 d suv
dodge dakota pickup 4wd 4.7 2008 8 auto(l5) 4 14 19 r pickup
dodge dakota pickup 4wd 4.7 2008 8 auto(l5) 4 14 19 r pickup
dodge dakota pickup 4wd 4.7 2008 8 auto(l5) 4 9 12 e pickup
dodge dakota pickup 4wd 5.2 1999 8 manual(m5) 4 11 17 r pickup
dodge dakota pickup 4wd 5.2 1999 8 auto(l4) 4 11 15 r pickup
dodge durango 4wd 4.7 2008 8 auto(l5) 4 13 17 r suv
dodge durango 4wd 4.7 2008 8 auto(l5) 4 9 12 e suv
dodge durango 4wd 4.7 2008 8 auto(l5) 4 13 17 r suv
dodge durango 4wd 5.2 1999 8 auto(l4) 4 11 16 r suv
dodge durango 4wd 5.7 2008 8 auto(l5) 4 13 18 r suv
dodge durango 4wd 5.9 1999 8 auto(l4) 4 11 15 r suv
dodge ram 1500 pickup 4wd4.7 2008 8 manual(m6) 4 12 16 r pickup
dodge ram 1500 pickup 4wd4.7 2008 8 auto(l5) 4 9 12 e pickup
dodge ram 1500 pickup 4wd4.7 2008 8 auto(l5) 4 13 17 r pickup
dodge ram 1500 pickup 4wd4.7 2008 8 auto(l5) 4 13 17 r pickup
ford explorer 4wd 5.0 1999 8 auto(l4) 4 13 17 r suv
ford f150 pickup 4wd 4.6 1999 8 manual(m5) 4 13 16 r pickup
ford f150 pickup 4wd 4.6 1999 8 auto(l4) 4 13 16 r pickup
ford f150 pickup 4wd 4.6 2008 8 auto(l4) 4 13 17 r pickup
ford f150 pickup 4wd 5.4 1999 8 auto(l4) 4 11 15 r pickup
ford f150 pickup 4wd 5.4 2008 8 auto(l4) 4 13 17 r pickup
ford mustang 4.6 1999 8 auto(l4) r 15 21 r subcompact
ford mustang 4.6 1999 8 manual(m5) r 15 22 r subcompact
ford mustang 4.6 2008 8 manual(m5) r 15 23 r subcompact
ford mustang 4.6 2008 8 auto(l5) r 15 22 r subcompact
ford mustang 5.4 2008 8 manual(m6) r 14 20 p subcompact
jeep grand cherokee 4wd 4.7 1999 8 auto(l4) 4 14 17 r suv
jeep grand cherokee 4wd 4.7 2008 8 auto(l5) 4 9 12 e suv
jeep grand cherokee 4wd 4.7 2008 8 auto(l5) 4 14 19 r suv
jeep grand cherokee 4wd 5.7 2008 8 auto(l5) 4 13 18 r suv
jeep grand cherokee 4wd 6.1 2008 8 auto(l5) 4 11 14 p suv
land rover range rover 4.0 1999 8 auto(l4) 4 11 15 p suv
land rover range rover 4.2 2008 8 auto(s6) 4 12 18 r suv
land rover range rover 4.4 2008 8 auto(s6) 4 12 18 r suv
land rover range rover 4.6 1999 8 auto(l4) 4 11 15 p suv
lincoln navigator 2wd 5.4 1999 8 auto(l4) r 11 17 r suv
lincoln navigator 2wd 5.4 1999 8 auto(l4) r 11 16 p suv
lincoln navigator 2wd 5.4 2008 8 auto(l6) r 12 18 r suv
mercury mountaineer 4wd 4.6 2008 8 auto(l6) 4 13 19 r suv
mercury mountaineer 4wd 5.0 1999 8 auto(l4) 4 13 17 r suv
nissan pathfinder 4wd 5.6 2008 8 auto(s5) 4 12 18 p suv
pontiac grand prix 5.3 2008 8 auto(s4) f 16 25 p midsize
toyota 4runner 4wd 4.7 2008 8 auto(l5) 4 14 17 r suv
toyota land cruiser wagon 4wd4.7 1999 8 auto(l4) 4 11 15 r suv
toyota land cruiser wagon 4wd5.7 2008 8 auto(s6) 4 13 18 r suv
#Filter to only those cars that have miles per gallon equal to 
mpg.8cyl<-mpg %>% 
  filter(cyl == 8)

#Alt Syntax
mpg.8cyl<-filter(mpg, cyl == 8)

mpg.8cyl

<th scope=col>manufacturer</th><th scope=col>model</th><th scope=col>displ</th><th scope=col>year</th><th scope=col>cyl</th><th scope=col>trans</th><th scope=col>drv</th><th scope=col>cty</th><th scope=col>hwy</th><th scope=col>fl</th><th scope=col>class</th>
audi a6 quattro 4.2 2008 8 auto(s6) 4 16 23 p midsize
chevrolet c1500 suburban 2wd 5.3 2008 8 auto(l4) r 14 20 r suv
chevrolet c1500 suburban 2wd 5.3 2008 8 auto(l4) r 11 15 e suv
chevrolet c1500 suburban 2wd 5.3 2008 8 auto(l4) r 14 20 r suv
chevrolet c1500 suburban 2wd 5.7 1999 8 auto(l4) r 13 17 r suv
chevrolet c1500 suburban 2wd 6.0 2008 8 auto(l4) r 12 17 r suv
chevrolet corvette 5.7 1999 8 manual(m6) r 16 26 p 2seater
chevrolet corvette 5.7 1999 8 auto(l4) r 15 23 p 2seater
chevrolet corvette 6.2 2008 8 manual(m6) r 16 26 p 2seater
chevrolet corvette 6.2 2008 8 auto(s6) r 15 25 p 2seater
chevrolet corvette 7.0 2008 8 manual(m6) r 15 24 p 2seater
chevrolet k1500 tahoe 4wd 5.3 2008 8 auto(l4) 4 14 19 r suv
chevrolet k1500 tahoe 4wd 5.3 2008 8 auto(l4) 4 11 14 e suv
chevrolet k1500 tahoe 4wd 5.7 1999 8 auto(l4) 4 11 15 r suv
chevrolet k1500 tahoe 4wd 6.5 1999 8 auto(l4) 4 14 17 d suv
dodge dakota pickup 4wd 4.7 2008 8 auto(l5) 4 14 19 r pickup
dodge dakota pickup 4wd 4.7 2008 8 auto(l5) 4 14 19 r pickup
dodge dakota pickup 4wd 4.7 2008 8 auto(l5) 4 9 12 e pickup
dodge dakota pickup 4wd 5.2 1999 8 manual(m5) 4 11 17 r pickup
dodge dakota pickup 4wd 5.2 1999 8 auto(l4) 4 11 15 r pickup
dodge durango 4wd 4.7 2008 8 auto(l5) 4 13 17 r suv
dodge durango 4wd 4.7 2008 8 auto(l5) 4 9 12 e suv
dodge durango 4wd 4.7 2008 8 auto(l5) 4 13 17 r suv
dodge durango 4wd 5.2 1999 8 auto(l4) 4 11 16 r suv
dodge durango 4wd 5.7 2008 8 auto(l5) 4 13 18 r suv
dodge durango 4wd 5.9 1999 8 auto(l4) 4 11 15 r suv
dodge ram 1500 pickup 4wd4.7 2008 8 manual(m6) 4 12 16 r pickup
dodge ram 1500 pickup 4wd4.7 2008 8 auto(l5) 4 9 12 e pickup
dodge ram 1500 pickup 4wd4.7 2008 8 auto(l5) 4 13 17 r pickup
dodge ram 1500 pickup 4wd4.7 2008 8 auto(l5) 4 13 17 r pickup
ford explorer 4wd 5.0 1999 8 auto(l4) 4 13 17 r suv
ford f150 pickup 4wd 4.6 1999 8 manual(m5) 4 13 16 r pickup
ford f150 pickup 4wd 4.6 1999 8 auto(l4) 4 13 16 r pickup
ford f150 pickup 4wd 4.6 2008 8 auto(l4) 4 13 17 r pickup
ford f150 pickup 4wd 5.4 1999 8 auto(l4) 4 11 15 r pickup
ford f150 pickup 4wd 5.4 2008 8 auto(l4) 4 13 17 r pickup
ford mustang 4.6 1999 8 auto(l4) r 15 21 r subcompact
ford mustang 4.6 1999 8 manual(m5) r 15 22 r subcompact
ford mustang 4.6 2008 8 manual(m5) r 15 23 r subcompact
ford mustang 4.6 2008 8 auto(l5) r 15 22 r subcompact
ford mustang 5.4 2008 8 manual(m6) r 14 20 p subcompact
jeep grand cherokee 4wd 4.7 1999 8 auto(l4) 4 14 17 r suv
jeep grand cherokee 4wd 4.7 2008 8 auto(l5) 4 9 12 e suv
jeep grand cherokee 4wd 4.7 2008 8 auto(l5) 4 14 19 r suv
jeep grand cherokee 4wd 5.7 2008 8 auto(l5) 4 13 18 r suv
jeep grand cherokee 4wd 6.1 2008 8 auto(l5) 4 11 14 p suv
land rover range rover 4.0 1999 8 auto(l4) 4 11 15 p suv
land rover range rover 4.2 2008 8 auto(s6) 4 12 18 r suv
land rover range rover 4.4 2008 8 auto(s6) 4 12 18 r suv
land rover range rover 4.6 1999 8 auto(l4) 4 11 15 p suv
lincoln navigator 2wd 5.4 1999 8 auto(l4) r 11 17 r suv
lincoln navigator 2wd 5.4 1999 8 auto(l4) r 11 16 p suv
lincoln navigator 2wd 5.4 2008 8 auto(l6) r 12 18 r suv
mercury mountaineer 4wd 4.6 2008 8 auto(l6) 4 13 19 r suv
mercury mountaineer 4wd 5.0 1999 8 auto(l4) 4 13 17 r suv
nissan pathfinder 4wd 5.6 2008 8 auto(s5) 4 12 18 p suv
pontiac grand prix 5.3 2008 8 auto(s4) f 16 25 p midsize
toyota 4runner 4wd 4.7 2008 8 auto(l5) 4 14 17 r suv
toyota land cruiser wagon 4wd4.7 1999 8 auto(l4) 4 11 15 r suv
toyota land cruiser wagon 4wd5.7 2008 8 auto(s6) 4 13 18 r suv
#Sort cars by MPG highway(hwy) then city(cty)
mpgsort<-arrange(mpg, hwy, cty)
mpgsort

<th scope=col>manufacturer</th><th scope=col>model</th><th scope=col>displ</th><th scope=col>year</th><th scope=col>cyl</th><th scope=col>trans</th><th scope=col>drv</th><th scope=col>cty</th><th scope=col>hwy</th><th scope=col>fl</th><th scope=col>class</th>
dodge dakota pickup 4wd 4.7 2008 8 auto(l5) 4 9 12 e pickup
dodge durango 4wd 4.7 2008 8 auto(l5) 4 9 12 e suv
dodge ram 1500 pickup 4wd 4.7 2008 8 auto(l5) 4 9 12 e pickup
dodge ram 1500 pickup 4wd 4.7 2008 8 manual(m6) 4 9 12 e pickup
jeep grand cherokee 4wd 4.7 2008 8 auto(l5) 4 9 12 e suv
chevrolet k1500 tahoe 4wd 5.3 2008 8 auto(l4) 4 11 14 e suv
jeep grand cherokee 4wd 6.1 2008 8 auto(l5) 4 11 14 p suv
chevrolet c1500 suburban 2wd 5.3 2008 8 auto(l4) r 11 15 e suv
chevrolet k1500 tahoe 4wd 5.7 1999 8 auto(l4) 4 11 15 r suv
dodge dakota pickup 4wd 5.2 1999 8 auto(l4) 4 11 15 r pickup
dodge durango 4wd 5.9 1999 8 auto(l4) 4 11 15 r suv
dodge ram 1500 pickup 4wd 5.2 1999 8 auto(l4) 4 11 15 r pickup
dodge ram 1500 pickup 4wd 5.9 1999 8 auto(l4) 4 11 15 r pickup
ford f150 pickup 4wd 5.4 1999 8 auto(l4) 4 11 15 r pickup
land rover range rover 4.0 1999 8 auto(l4) 4 11 15 p suv
land rover range rover 4.6 1999 8 auto(l4) 4 11 15 p suv
toyota land cruiser wagon 4wd4.7 1999 8 auto(l4) 4 11 15 r suv
dodge durango 4wd 5.2 1999 8 auto(l4) 4 11 16 r suv
dodge ram 1500 pickup 4wd 5.2 1999 8 manual(m5) 4 11 16 r pickup
lincoln navigator 2wd 5.4 1999 8 auto(l4) r 11 16 p suv
dodge ram 1500 pickup 4wd 4.7 2008 8 manual(m6) 4 12 16 r pickup
dodge ram 1500 pickup 4wd 4.7 2008 8 manual(m6) 4 12 16 r pickup
ford f150 pickup 4wd 4.6 1999 8 manual(m5) 4 13 16 r pickup
ford f150 pickup 4wd 4.6 1999 8 auto(l4) 4 13 16 r pickup
dodge caravan 2wd 3.3 2008 6 auto(l4) f 11 17 e minivan
dodge dakota pickup 4wd 5.2 1999 8 manual(m5) 4 11 17 r pickup
ford expedition 2wd 4.6 1999 8 auto(l4) r 11 17 r suv
ford expedition 2wd 5.4 1999 8 auto(l4) r 11 17 r suv
lincoln navigator 2wd 5.4 1999 8 auto(l4) r 11 17 r suv
chevrolet c1500 suburban 2wd 6.0 2008 8 auto(l4) r 12 17 r suv
volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p midsize
volkswagen gti 2.0 2008 4 auto(s6) f 22 29 p compact
volkswagen jetta 2.0 2008 4 auto(s6) f 22 29 p compact
honda civic 1.6 1999 4 manual(m5) f 23 29 p subcompact
audi a4 2.0 2008 4 auto(av) f 21 30 p compact
hyundai sonata 2.4 2008 4 auto(l4) f 21 30 r midsize
chevrolet malibu 2.4 2008 4 auto(l4) f 22 30 r midsize
toyota corolla 1.8 1999 4 auto(l3) f 24 30 r compact
audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
hyundai sonata 2.4 2008 4 manual(m5) f 21 31 r midsize
toyota camry 2.4 2008 4 manual(m5) f 21 31 r midsize
toyota camry 2.4 2008 4 auto(l5) f 21 31 r midsize
toyota camry solara2.4 2008 4 manual(m5) f 21 31 r compact
toyota camry solara2.4 2008 4 auto(s5) f 22 31 r compact
nissan altima 2.5 2008 4 auto(av) f 23 31 r midsize
nissan altima 2.5 2008 4 manual(m6) f 23 32 r midsize
honda civic 1.6 1999 4 auto(l4) f 24 32 r subcompact
honda civic 1.6 1999 4 auto(l4) f 24 32 r subcompact
honda civic 1.6 1999 4 manual(m5) f 25 32 r subcompact
toyota corolla 1.8 1999 4 auto(l4) f 24 33 r compact
honda civic 1.6 1999 4 manual(m5) f 28 33 r subcompact
honda civic 1.8 2008 4 manual(m5) f 26 34 r subcompact
toyota corolla 1.8 1999 4 manual(m5) f 26 35 r compact
toyota corolla 1.8 2008 4 auto(l4) f 26 35 r compact
honda civic 1.8 2008 4 auto(l5) f 24 36 c subcompact
honda civic 1.8 2008 4 auto(l5) f 25 36 r subcompact
toyota corolla 1.8 2008 4 manual(m5) f 28 37 r compact
volkswagen new beetle 1.9 1999 4 auto(l4) f 29 41 d subcompact
volkswagen jetta 1.9 1999 4 manual(m5) f 33 44 d compact
volkswagen new beetle 1.9 1999 4 manual(m5) f 35 44 d subcompact
#From the documentation https://cran.r-project.org/web/packages/dplyr/dplyr.pdf  
select(iris, starts_with("petal")) #returns columns that start with "Petal"
select(iris, ends_with("width")) #returns columns that start with "Width"
select(iris, contains("etal"))
select(iris, matches(".t."))
select(iris, Petal.Length, Petal.Width)
vars <- c("Petal.Length", "Petal.Width")
select(iris, one_of(vars))

<th scope=col>Petal.Length</th><th scope=col>Petal.Width</th>
1.40.2
1.40.2
1.30.2
1.50.2
1.40.2
1.70.4
1.40.3
1.50.2
1.40.2
1.50.1
1.50.2
1.60.2
1.40.1
1.10.1
1.20.2
1.50.4
1.30.4
1.40.3
1.70.3
1.50.3
1.70.2
1.50.4
1.00.2
1.70.5
1.90.2
1.60.2
1.60.4
1.50.2
1.40.2
1.60.2
5.72.3
4.92.0
6.72.0
4.91.8
5.72.1
6.01.8
4.81.8
4.91.8
5.62.1
5.81.6
6.11.9
6.42.0
5.62.2
5.11.5
5.61.4
6.12.3
5.62.4
5.51.8
4.81.8
5.42.1
5.62.4
5.12.3
5.11.9
5.92.3
5.72.5
5.22.3
5.01.9
5.22.0
5.42.3
5.11.8
<th scope=col>Sepal.Width</th><th scope=col>Petal.Width</th>
3.50.2
3.00.2
3.20.2
3.10.2
3.60.2
3.90.4
3.40.3
3.40.2
2.90.2
3.10.1
3.70.2
3.40.2
3.00.1
3.00.1
4.00.2
4.40.4
3.90.4
3.50.3
3.80.3
3.80.3
3.40.2
3.70.4
3.60.2
3.30.5
3.40.2
3.00.2
3.40.4
3.50.2
3.40.2
3.20.2
3.22.3
2.82.0
2.82.0
2.71.8
3.32.1
3.21.8
2.81.8
3.01.8
2.82.1
3.01.6
2.81.9
3.82.0
2.82.2
2.81.5
2.61.4
3.02.3
3.42.4
3.11.8
3.01.8
3.12.1
3.12.4
3.12.3
2.71.9
3.22.3
3.32.5
3.02.3
2.51.9
3.02.0
3.42.3
3.01.8
<th scope=col>Petal.Length</th><th scope=col>Petal.Width</th>
1.40.2
1.40.2
1.30.2
1.50.2
1.40.2
1.70.4
1.40.3
1.50.2
1.40.2
1.50.1
1.50.2
1.60.2
1.40.1
1.10.1
1.20.2
1.50.4
1.30.4
1.40.3
1.70.3
1.50.3
1.70.2
1.50.4
1.00.2
1.70.5
1.90.2
1.60.2
1.60.4
1.50.2
1.40.2
1.60.2
5.72.3
4.92.0
6.72.0
4.91.8
5.72.1
6.01.8
4.81.8
4.91.8
5.62.1
5.81.6
6.11.9
6.42.0
5.62.2
5.11.5
5.61.4
6.12.3
5.62.4
5.51.8
4.81.8
5.42.1
5.62.4
5.12.3
5.11.9
5.92.3
5.72.5
5.22.3
5.01.9
5.22.0
5.42.3
5.11.8
<th scope=col>Sepal.Length</th><th scope=col>Sepal.Width</th><th scope=col>Petal.Length</th><th scope=col>Petal.Width</th>
5.13.51.40.2
4.93.01.40.2
4.73.21.30.2
4.63.11.50.2
5.03.61.40.2
5.43.91.70.4
4.63.41.40.3
5.03.41.50.2
4.42.91.40.2
4.93.11.50.1
5.43.71.50.2
4.83.41.60.2
4.83.01.40.1
4.33.01.10.1
5.84.01.20.2
5.74.41.50.4
5.43.91.30.4
5.13.51.40.3
5.73.81.70.3
5.13.81.50.3
5.43.41.70.2
5.13.71.50.4
4.63.61.00.2
5.13.31.70.5
4.83.41.90.2
5.03.01.60.2
5.03.41.60.4
5.23.51.50.2
5.23.41.40.2
4.73.21.60.2
6.93.25.72.3
5.62.84.92.0
7.72.86.72.0
6.32.74.91.8
6.73.35.72.1
7.23.26.01.8
6.22.84.81.8
6.13.04.91.8
6.42.85.62.1
7.23.05.81.6
7.42.86.11.9
7.93.86.42.0
6.42.85.62.2
6.32.85.11.5
6.12.65.61.4
7.73.06.12.3
6.33.45.62.4
6.43.15.51.8
6.03.04.81.8
6.93.15.42.1
6.73.15.62.4
6.93.15.12.3
5.82.75.11.9
6.83.25.92.3
6.73.35.72.5
6.73.05.22.3
6.32.55.01.9
6.53.05.22.0
6.23.45.42.3
5.93.05.11.8
<th scope=col>Petal.Length</th><th scope=col>Petal.Width</th>
1.40.2
1.40.2
1.30.2
1.50.2
1.40.2
1.70.4
1.40.3
1.50.2
1.40.2
1.50.1
1.50.2
1.60.2
1.40.1
1.10.1
1.20.2
1.50.4
1.30.4
1.40.3
1.70.3
1.50.3
1.70.2
1.50.4
1.00.2
1.70.5
1.90.2
1.60.2
1.60.4
1.50.2
1.40.2
1.60.2
5.72.3
4.92.0
6.72.0
4.91.8
5.72.1
6.01.8
4.81.8
4.91.8
5.62.1
5.81.6
6.11.9
6.42.0
5.62.2
5.11.5
5.61.4
6.12.3
5.62.4
5.51.8
4.81.8
5.42.1
5.62.4
5.12.3
5.11.9
5.92.3
5.72.5
5.22.3
5.01.9
5.22.0
5.42.3
5.11.8
<th scope=col>Petal.Length</th><th scope=col>Petal.Width</th>
1.40.2
1.40.2
1.30.2
1.50.2
1.40.2
1.70.4
1.40.3
1.50.2
1.40.2
1.50.1
1.50.2
1.60.2
1.40.1
1.10.1
1.20.2
1.50.4
1.30.4
1.40.3
1.70.3
1.50.3
1.70.2
1.50.4
1.00.2
1.70.5
1.90.2
1.60.2
1.60.4
1.50.2
1.40.2
1.60.2
5.72.3
4.92.0
6.72.0
4.91.8
5.72.1
6.01.8
4.81.8
4.91.8
5.62.1
5.81.6
6.11.9
6.42.0
5.62.2
5.11.5
5.61.4
6.12.3
5.62.4
5.51.8
4.81.8
5.42.1
5.62.4
5.12.3
5.11.9
5.92.3
5.72.5
5.22.3
5.01.9
5.22.0
5.42.3
5.11.8
#Recoding Data
# See Creating new variables with mutate and ifelse: 
# https://rstudio-pubs-static.s3.amazonaws.com/116317_e6922e81e72e4e3f83995485ce686c14.html 
mutate(mpg, displ_l = displ / 61.0237)


<th scope=col>manufacturer</th><th scope=col>model</th><th scope=col>displ</th><th scope=col>year</th><th scope=col>cyl</th><th scope=col>trans</th><th scope=col>drv</th><th scope=col>cty</th><th scope=col>hwy</th><th scope=col>fl</th><th scope=col>class</th><th scope=col>displ_l</th>
audi a4 1.8 1999 4 auto(l5) f 18 29 p compact 0.02949674
audi a4 1.8 1999 4 manual(m5) f 21 29 p compact 0.02949674
audi a4 2.0 2008 4 manual(m6) f 20 31 p compact 0.03277415
audi a4 2.0 2008 4 auto(av) f 21 30 p compact 0.03277415
audi a4 2.8 1999 6 auto(l5) f 16 26 p compact 0.04588381
audi a4 2.8 1999 6 manual(m5) f 18 26 p compact 0.04588381
audi a4 3.1 2008 6 auto(av) f 18 27 p compact 0.05079994
audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact 0.02949674
audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact 0.02949674
audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28 p compact 0.03277415
audi a4 quattro 2.0 2008 4 auto(s6) 4 19 27 p compact 0.03277415
audi a4 quattro 2.8 1999 6 auto(l5) 4 15 25 p compact 0.04588381
audi a4 quattro 2.8 1999 6 manual(m5) 4 17 25 p compact 0.04588381
audi a4 quattro 3.1 2008 6 auto(s6) 4 17 25 p compact 0.05079994
audi a4 quattro 3.1 2008 6 manual(m6) 4 15 25 p compact 0.05079994
audi a6 quattro 2.8 1999 6 auto(l5) 4 15 24 p midsize 0.04588381
audi a6 quattro 3.1 2008 6 auto(s6) 4 17 25 p midsize 0.05079994
audi a6 quattro 4.2 2008 8 auto(s6) 4 16 23 p midsize 0.06882572
chevrolet c1500 suburban 2wd5.3 2008 8 auto(l4) r 14 20 r suv 0.08685150
chevrolet c1500 suburban 2wd5.3 2008 8 auto(l4) r 11 15 e suv 0.08685150
chevrolet c1500 suburban 2wd5.3 2008 8 auto(l4) r 14 20 r suv 0.08685150
chevrolet c1500 suburban 2wd5.7 1999 8 auto(l4) r 13 17 r suv 0.09340633
chevrolet c1500 suburban 2wd6.0 2008 8 auto(l4) r 12 17 r suv 0.09832246
chevrolet corvette 5.7 1999 8 manual(m6) r 16 26 p 2seater 0.09340633
chevrolet corvette 5.7 1999 8 auto(l4) r 15 23 p 2seater 0.09340633
chevrolet corvette 6.2 2008 8 manual(m6) r 16 26 p 2seater 0.10159987
chevrolet corvette 6.2 2008 8 auto(s6) r 15 25 p 2seater 0.10159987
chevrolet corvette 7.0 2008 8 manual(m6) r 15 24 p 2seater 0.11470953
chevrolet k1500 tahoe 4wd 5.3 2008 8 auto(l4) 4 14 19 r suv 0.08685150
chevrolet k1500 tahoe 4wd 5.3 2008 8 auto(l4) 4 11 14 e suv 0.08685150
toyota toyota tacoma 4wd3.4 1999 6 auto(l4) 4 15 19 r pickup 0.05571606
toyota toyota tacoma 4wd4.0 2008 6 manual(m6) 4 15 18 r pickup 0.06554830
toyota toyota tacoma 4wd4.0 2008 6 auto(l5) 4 16 20 r pickup 0.06554830
volkswagen gti 2.0 1999 4 manual(m5) f 21 29 r compact 0.03277415
volkswagen gti 2.0 1999 4 auto(l4) f 19 26 r compact 0.03277415
volkswagen gti 2.0 2008 4 manual(m6) f 21 29 p compact 0.03277415
volkswagen gti 2.0 2008 4 auto(s6) f 22 29 p compact 0.03277415
volkswagen gti 2.8 1999 6 manual(m5) f 17 24 r compact 0.04588381
volkswagen jetta 1.9 1999 4 manual(m5) f 33 44 d compact 0.03113544
volkswagen jetta 2.0 1999 4 manual(m5) f 21 29 r compact 0.03277415
volkswagen jetta 2.0 1999 4 auto(l4) f 19 26 r compact 0.03277415
volkswagen jetta 2.0 2008 4 auto(s6) f 22 29 p compact 0.03277415
volkswagen jetta 2.0 2008 4 manual(m6) f 21 29 p compact 0.03277415
volkswagen jetta 2.5 2008 5 auto(s6) f 21 29 r compact 0.04096769
volkswagen jetta 2.5 2008 5 manual(m5) f 21 29 r compact 0.04096769
volkswagen jetta 2.8 1999 6 auto(l4) f 16 23 r compact 0.04588381
volkswagen jetta 2.8 1999 6 manual(m5) f 17 24 r compact 0.04588381
volkswagen new beetle 1.9 1999 4 manual(m5) f 35 44 d subcompact 0.03113544
volkswagen new beetle 1.9 1999 4 auto(l4) f 29 41 d subcompact 0.03113544
volkswagen new beetle 2.0 1999 4 manual(m5) f 21 29 r subcompact 0.03277415
volkswagen new beetle 2.0 1999 4 auto(l4) f 19 26 r subcompact 0.03277415
volkswagen new beetle 2.5 2008 5 manual(m5) f 20 28 r subcompact 0.04096769
volkswagen new beetle 2.5 2008 5 auto(s6) f 20 29 r subcompact 0.04096769
volkswagen passat 1.8 1999 4 manual(m5) f 21 29 p midsize 0.02949674
volkswagen passat 1.8 1999 4 auto(l5) f 18 29 p midsize 0.02949674
volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p midsize 0.03277415
volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p midsize 0.03277415
volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p midsize 0.04588381
volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p midsize 0.04588381
volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p midsize 0.05899347
# Example taken from David Ranzolin
# https://rstudio-pubs-static.s3.amazonaws.com/116317_e6922e81e72e4e3f83995485ce686c14.html#/9 
section <- c("MATH111", "MATH111", "ENG111")
grade <- c(78, 93, 56)
student <- c("David", "Kristina", "Mycroft")
gradebook <- data.frame(section, grade, student)

#As the output is a tibble, here we are saving each intermediate version.
gradebook2<-mutate(gradebook, Pass.Fail = ifelse(grade > 60, "Pass", "Fail"))  

gradebook3<-mutate(gradebook2, letter = ifelse(grade %in% 60:69, "D",
                                               ifelse(grade %in% 70:79, "C",
                                                      ifelse(grade %in% 80:89, "B",
                                                             ifelse(grade %in% 90:99, "A", "F")))))

gradebook3

<th scope=col>section</th><th scope=col>grade</th><th scope=col>student</th><th scope=col>Pass.Fail</th><th scope=col>letter</th>
MATH111 78 David Pass C
MATH111 93 KristinaPass A
ENG111 56 Mycroft Fail F
#Here we are using piping to do this more effectively. 
gradebook4<-gradebook %>%
mutate(Pass.Fail = ifelse(grade > 60, "Pass", "Fail"))  %>%
mutate(letter = ifelse(grade %in% 60:69, "D", 
                                  ifelse(grade %in% 70:79, "C",
                                         ifelse(grade %in% 80:89, "B",
                                                ifelse(grade %in% 90:99, "A", "F")))))


gradebook4

<th scope=col>section</th><th scope=col>grade</th><th scope=col>student</th><th scope=col>Pass.Fail</th><th scope=col>letter</th>
MATH111 78 David Pass C
MATH111 93 KristinaPass A
ENG111 56 Mycroft Fail F
#find the average city and highway mpg
summarise(mpg, mean(cty), mean(hwy))
#find the average city and highway mpg by cylander
summarise(group_by(mpg, cyl), mean(cty), mean(hwy))
summarise(group_by(mtcars, cyl), m = mean(disp), sd = sd(disp))

# With data frames, you can create and immediately use summaries
by_cyl <- mtcars %>% group_by(cyl)
by_cyl %>% summarise(a = n(), b = a + 1)

<th scope=col>mean(cty)</th><th scope=col>mean(hwy)</th>
16.8589723.44017
<th scope=col>cyl</th><th scope=col>mean(cty)</th><th scope=col>mean(hwy)</th>
4 21.0123528.80247
5 20.5000028.75000
6 16.2151922.82278
8 12.5714317.62857
<th scope=col>cyl</th><th scope=col>m</th><th scope=col>sd</th>
4 105.136426.87159
6 183.314341.56246
8 353.100067.77132
<th scope=col>cyl</th><th scope=col>a</th><th scope=col>b</th>
4 1112
6 7 8
8 1415

#This was adopted from the Berkley R Bootcamp.