Data Cleaning in R

With the Tidyverse

Josh Allen

Department of Political Science at Georgia State University


In our last workshop we covered

  • Assignment

  • Indexing

  • Generating Descriptive Statistics

  • Some data cleaning(subsetting, generating new variables)

  • A Bit of Graphing

Today we will cover:

  • The Tidyverse (minus ggplot2)

Cleaning Data in the Tidyverse

Like families, tidy datasets are all alike but every messy dataset is messy in its own way.
Hadley Wickham

Packages You Will Need

starwars <- read_csv("starwars.csv")
penguins <- read_csv("penguins.csv")

What is in the Tidyverse?

 [1] "broom"         "cli"           "crayon"        "dbplyr"       
 [5] "dplyr"         "dtplyr"        "forcats"       "ggplot2"      
 [9] "googledrive"   "googlesheets4" "haven"         "hms"          
[13] "httr"          "jsonlite"      "lubridate"     "magrittr"     
[17] "modelr"        "pillar"        "purrr"         "readr"        
[21] "readxl"        "reprex"        "rlang"         "rstudioapi"   
[25] "rvest"         "stringr"       "tibble"        "tidyr"        
[29] "xml2"          "tidyverse"    

What is loaded?

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Namespace Conflicts

  • Since R is open source you can name your functions just about anything

    • this results in lots of packages having similarly named functions or the same name
  • Dplyr is just warning us that if we use filter or lag it will use dplyr’s version of the function

  • Whenever R runs into a namespace conflict it will default to the last package that was loaded

  • That is why it is generally best practice to load the most important package last

  • You can also use packagename::function you want to use to get around it.

    • This is called a namespace call
    • explicitly tells R which function you are using



Extract rows with filter() filter
Extract columns with select() select
Arrange/sort rows with arrange() arrange
Make new columns with mutate() mutate
Make group summaries with
group_by() %>% summarize()

How A Dplyr Verb Works

verb(.data, ...)
  • All dplyr stuff works along these lines

  • verb is one of the Dplyr verbs i.e select

  • .data is the dataset you want to manipulate

    • this is true for the rest of the tidyverse
  • ... is just a set of things that the verb does.


There have been some significant additions to dplyr 1.1.0. This workshop was written under dplyr 1.0.10


 select(.data = penguins, species)

You can give select a range of columns with firstcolumn:thirdcolumn or omit stuff using -columnwedontwant


Subsetting by Rows


  filter(.data = penguins,
         species == "Adelie",
         body_mass_g < 6200,
         bill_length_mm < 59.6)
species body_mass_g bill_length_mm
Adelie 3750 39.1
Adelie 3800 39.5
Adelie 3250 40.3
Adelie 3450 36.7
Adelie 3650 39.3
Adelie 3625 38.9

What Kind of Tests Can I Do?

Test Meaning Test Meaning
x < y Less than x %in% y In (group membership)
x > y Greater than Is missing
== Equal to ! Is not missing
x <= y Less than or equal to
x >= y Greater than or equal to
x != y Not equal to

The Default

filter(starwars, homeworld == "Naboo",
                 homeworld == "Coruscant")
name height homeworld

These Do The Same Thing

filter(starwars, homeworld == "Naboo",
                 mass < 84.5)
filter(starwars, homeworld == "Naboo" &
                 mass < 84.5)
name mass homeworld
R2-D2 32 Naboo
Palpatine 75 Naboo
name mass homeworld
R2-D2 32 Naboo
Palpatine 75 Naboo

Getting Multiple Things From the Same Column

  • To get a subset of homeworld we chain together multiple | tests together
filter(starwars, homeworld == "Naboo" |
                 homeworld == "Coruscant" |
                 homeworld == "Tatooine")
name homeworld
Luke Skywalker Tatooine
C-3PO Tatooine
R2-D2 Naboo
Darth Vader Tatooine
Owen Lars Tatooine
Beru Whitesun lars Tatooine
R5-D4 Tatooine
... ...

Getting Multiple Things From the Same Column(cont)

  homeworld %in% c("Naboo",
name homeworld
Luke Skywalker Tatooine
C-3PO Tatooine
R2-D2 Naboo
Darth Vader Tatooine
Owen Lars Tatooine
Beru Whitesun lars Tatooine
R5-D4 Tatooine
Biggs Darklighter Tatooine
Anakin Skywalker Tatooine
Palpatine Naboo
Finis Valorum Coruscant
... ...

filter() mistakes we all make

filter(penguins, species = "Gentoo")
Error in `filter()`:
! We detected a named input.
ℹ This usually means that you've used `=` instead of `==`.
ℹ Did you mean `species == "Gentoo"`?
filter(starwars, homeworld == "Nabo")
name height mass hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles starships

Your Turn

  • Try removing the missing values from bill_length_mm hint: use ! and

  • Return a data dataset that only has data for the Dream island

  • Using either the starwars or penguins data use %in% to get a set of homeworlds or islands

  • Bonus: return a set of islands or homeworlds not in that set

  • Subset the penguin data where body mass is less than 4202 and not on the Dream Island.




   log_bill_length = round(log(bill_length_mm)
    , digits = 2))
species bill_length_mm log_bill_length
Adelie 39.1 3.67
Adelie 39.5 3.68
Adelie 40.3 3.7
Adelie 36.7 3.6
Adelie 39.3 3.67
Adelie 38.9 3.66
... ... ...


      bill_length_mm = bill_length_mm / bill_depth_mm) 
bill_length_mm bill_depth_mm
2.09090909090909 18.7
2.27011494252874 17.4
2.23888888888889 18
1.90155440414508 19.3
... ...

Logical Tests

  • There are various ways to do this in mutate but they all follow the same logic
  • Test is a logical test species == "Chinstrap", species == "Wookie", etc

  • Value_if_True what does it do if the test returns true

  • Value_if_FALSE what does it do if the test returns false


  big_penguin = ifelse(
  body_mass_g >= 4750,  # median(body_mass_g) aslo works
body_mass_g big_penguin
3750 FALSE
3800 FALSE
3250 FALSE
3450 FALSE
3650 FALSE
3625 FALSE
4675 FALSE
3475 FALSE
4250 FALSE


  • Mutate is order aware so you don’t have to use a new mutate for each new variable you want to create
  mutate(penguins, long_bill = bill_length_mm * 2,
         long_bill_logical =
           ifelse(long_bill >= 100
                  & long_bill <= 119.20,
bill_length_mm long_bill long_bill_logical species
46.1 92.2 FALSE NA
50 100 TRUE NA
48.7 97.4 FALSE NA
50 100 TRUE NA
47.6 95.2 FALSE NA
46.5 93 FALSE NA
... ... ... ...

Your Turn

  • Write code to

  • Add a column in your dataset that is TRUE if a penguin is an Adelie penguin

  • Add a column in the starwars dataset that says Naboo or Tatooine, and Not Naboo or Tatooine if the character is not from there

  • Add a column in your dataset that squares the body mass (hint: use ^)


Doing More Than One Thing at a Time

Data Cleaning

  • Often requires lots of intermediary steps

    • replacing missing values
    • subsetting data
    • creating new variables
  • We would ideally like to do this without assigning a new object each time

  • If things start going it may be difficult to spot where it went wrong

  • Enter dplyr and the pipe

Remember How A Dplyr Verb Works

verb(.data, ...)
  • All dplyr stuff works along these lines

  • verb is one of the Dplyr verbs

  • .data is the dataset you want to manipulate

    • this is true for the rest of the tidyverse
  • ... is just a set of things that the verb does.

Side By Side

filter(.data = penguins, species == "Gentoo")
# A tibble: 124 × 8
   species island bill_length_mm bill_depth_mm flipper_len…¹ body_…² sex    year
   <fct>   <fct>           <dbl>         <dbl>         <int>   <int> <fct> <int>
 1 Gentoo  Biscoe           46.1          13.2           211    4500 fema…  2007
 2 Gentoo  Biscoe           50            16.3           230    5700 male   2007
 3 Gentoo  Biscoe           48.7          14.1           210    4450 fema…  2007
 4 Gentoo  Biscoe           50            15.2           218    5700 male   2007
 5 Gentoo  Biscoe           47.6          14.5           215    5400 male   2007
 6 Gentoo  Biscoe           46.5          13.5           210    4550 fema…  2007
 7 Gentoo  Biscoe           45.4          14.6           211    4800 fema…  2007
 8 Gentoo  Biscoe           46.7          15.3           219    5200 male   2007
 9 Gentoo  Biscoe           43.3          13.4           209    4400 fema…  2007
10 Gentoo  Biscoe           46.8          15.4           215    5150 male   2007
# … with 114 more rows, and abbreviated variable names ¹​flipper_length_mm,
#   ²​body_mass_g
select(.data = penguins, species:bill_length_mm)
# A tibble: 344 × 3
   species island    bill_length_mm
   <fct>   <fct>              <dbl>
 1 Adelie  Torgersen           39.1
 2 Adelie  Torgersen           39.5
 3 Adelie  Torgersen           40.3
 4 Adelie  Torgersen           NA  
 5 Adelie  Torgersen           36.7
 6 Adelie  Torgersen           39.3
 7 Adelie  Torgersen           38.9
 8 Adelie  Torgersen           39.2
 9 Adelie  Torgersen           34.1
10 Adelie  Torgersen           42  
# … with 334 more rows
mutate(.data = penguins, bill_length_mm_sq = bill_length_mm^2)
# A tibble: 344 × 9
   species island    bill_length_mm bill_d…¹ flipp…² body_…³ sex    year bill_…⁴
   <fct>   <fct>              <dbl>    <dbl>   <int>   <int> <fct> <int>   <dbl>
 1 Adelie  Torgersen           39.1     18.7     181    3750 male   2007   1529.
 2 Adelie  Torgersen           39.5     17.4     186    3800 fema…  2007   1560.
 3 Adelie  Torgersen           40.3     18       195    3250 fema…  2007   1624.
 4 Adelie  Torgersen           NA       NA        NA      NA <NA>   2007     NA 
 5 Adelie  Torgersen           36.7     19.3     193    3450 fema…  2007   1347.
 6 Adelie  Torgersen           39.3     20.6     190    3650 male   2007   1544.
 7 Adelie  Torgersen           38.9     17.8     181    3625 fema…  2007   1513.
 8 Adelie  Torgersen           39.2     19.6     195    4675 male   2007   1537.
 9 Adelie  Torgersen           34.1     18.1     193    3475 <NA>   2007   1163.
10 Adelie  Torgersen           42       20.2     190    4250 <NA>   2007   1764 
# … with 334 more rows, and abbreviated variable names ¹​bill_depth_mm,
#   ²​flipper_length_mm, ³​body_mass_g, ⁴​bill_length_mm_sq


  • A pipe takes whats on the left hand side of the pipe and evaluates it as the first argument on the right hand side

  • this leverages the common syntax effectively

    • this is not limited to just dplyr stuff
  • If you look behind the curtain of the workshop slides you will see pipes everywhere!

Magrittr Piping

  • The tidyverse has its own pipe %>% and used to be the only game in town.
    • %>% is just % followed by > followed by %
  • You can see the advantages of the pipe when we need to do multiple things to a dataset
## These do the same thing
  female = ifelse(sex == "female",
    TRUE, FALSE)),
     species == "Adelie")
# these do the same thing
penguins %>%  
filter(species == "Adelie") %>% 
mutate(female = ifelse(sex == "female", TRUE, FALSE))

This example an adaptation provided by Grant McDermot

Magrittr Piping(cont)

  • When you use the pipe it is easier to think of the pipe as saying and then
I %>%
    wake_up(time = "8.00am") %>%
    get_out_of_bed(side = "correct") %>%
    get_dressed(pants = "TRUE", shirt = "TRUE") %>% 
    leave_house(car = TRUE, bike = FALSE, MARTA = FALSE) %>%
    am_late(traffic = TRUE)

Base R Pipe

  • The pipe caught on and the team behind R added a native pipe |>

    • this is just | followed by >
  • If you are have a version of R that is 4.2.0>= it should come with the native pipe

  • The base versus magrittr pipe differ slightly and it is worth knowing some of the differences

  • The base R pipe is pretty flexible and supports some cool computer sciency stuff for more check out this page

Group_by() and Summarize()


  • group_by() simply puts rows into groups based on values of a column

  • The grouped data frames will continue until you do ungroup

  • Not necessarily the most useful function because nothing really happens when called by itself

  • Unless you combine it with summarize()

    • Note: you will see summarise as well in people’s code because the creator and maintainer is from New Zealand
penguins |> 
  group_by(species) |>
  select(bill_length_mm, island) |> 
# A tibble: 5 × 3
# Groups:   species [1]
  species bill_length_mm island   
  <fct>            <dbl> <fct>    
1 Adelie            39.1 Torgersen
2 Adelie            39.5 Torgersen
3 Adelie            40.3 Torgersen
4 Adelie            NA   Torgersen
5 Adelie            36.7 Torgersen

Find the Number of Each Species

penguins |> 
  group_by(species) |> 
species n()
Adelie 152
Chinstrap 68
Gentoo 124

Multiple Summary Statistics

penguins |> 
  group_by(species) |> 
        mean_bill_length = mean(bill_length_mm,
                                na.rm = TRUE))
species mean_bill_length n()
Adelie 38.79 152
Chinstrap 48.83 68
Gentoo 47.5 124

Recent Features

  • You can now add temporary groupings
penguins |> 
            mean(bill_length_mm, na.rm = TRUE),
         .by = species)
species mean_bill_length n()
Adelie 38.79 152
Gentoo 47.5 124
Chinstrap 48.83 68

Your Turn

  • Calculate the minimum, maximum, and median body_mass_g for each species of penguin

  • What happens if you remove group_by()?

  • Calculate the number of distinct penguin species per island

    • hint: type n_

Other useful dplyr stuff


  • Often times we need to get data from another dataset

  • In Dplyr we use join operations

  • inner_join(df1, df2)

  • left_join(df1, df2)

  • right_join(df1, df2)

  • full_join(df1, df2)

  • semi_join(df1, df2)

  • anti_join(df1, df2)


  • The basic syntax for each join is the same _join(df1, df2, by = "var I want to join on)

  • The by argument can take a list of variables or you can just let dplyr guess(bad idea)

  • Each join does something different and some are more cautious than others

  • I tend to use left_join the most and is handy when you are trying to fill in gaps in panel data

data1 = data.frame(ID = 1:2,                      ## Create first example data frame
                    X1 = c("a1", "a2"),
                    stringsAsFactors = FALSE)
1 a1
2 a2
data2 = data.frame(ID = 2:3,                      ## Create second example data frame
                    X2 = c("b1", "b2"),
                    stringsAsFactors = FALSE)
2 b1
3 b2


left_join(data1, data2, join_by(ID))
ID X1 X2
1 a1 NA
2 a2 b1

Using “Real” Data

state year unemployment inflation population
GA 2018 5.0 2.0 100
GA 2019 5.3 1.8 200
GA 2020 5.2 2.5 300
NC 2018 6.1 1.8 350
NC 2019 5.9 1.6 375
NC 2020 5.3 1.8 400
CO 2018 4.7 2.7 200
CO 2019 4.4 2.6 300
CO 2020 5.1 2.5 400
state year libraries schools
CO 2018 230 470
CO 2019 240 440
CO 2020 270 510
NC 2018 200 610
NC 2019 210 590
NC 2020 220 530


national_combined = left_join(national_data, national_libraries, 
                                    join_by(state, year)) 

state year unemployment inflation population libraries schools
GA 2018 5.0 2.0 100 NA NA
GA 2019 5.3 1.8 200 NA NA
GA 2020 5.2 2.5 300 NA NA
NC 2018 6.1 1.8 350 200 610
NC 2019 5.9 1.6 375 210 590
NC 2020 5.3 1.8 400 220 530
CO 2018 4.7 2.7 200 230 470
CO 2019 4.4 2.6 300 240 440
CO 2020 5.1 2.5 400 270 510

Combined Data

national_combined = national_data |> 
  left_join(national_libraries, join_by(state, year))

state year unemployment inflation population libraries schools
GA 2018 5.0 2.0 100 NA NA
GA 2019 5.3 1.8 200 NA NA
GA 2020 5.2 2.5 300 NA NA
NC 2018 6.1 1.8 350 200 610
NC 2019 5.9 1.6 375 210 590
NC 2020 5.3 1.8 400 220 530
CO 2018 4.7 2.7 200 230 470
CO 2019 4.4 2.6 300 240 440
CO 2020 5.1 2.5 400 270 510

What if our Columns Have Different Names?

state year unemployment inflation population
GA 2018 5.0 2.0 100
GA 2019 5.3 1.8 200
GA 2020 5.2 2.5 300
NC 2018 6.1 1.8 350
NC 2019 5.9 1.6 375
NC 2020 5.3 1.8 400
CO 2018 4.7 2.7 200
CO 2019 4.4 2.6 300
CO 2020 5.1 2.5 400
statename year libraries schools
CO 2018 230 470
CO 2019 240 440
CO 2020 270 510
NC 2018 200 610
NC 2019 210 590
NC 2020 220 530

Renaming Columns

  • Renaming stuff in dplyr is easy

  • we use the same syntax as dplyr::rename()

  • rename(newvarname = oldvarname)

national_data |> 
  left_join(national_libraries, join_by(state == statename, year))
state year unemployment inflation population libraries schools
GA 2018 5.0 2.0 100 NA NA
GA 2019 5.3 1.8 200 NA NA
GA 2020 5.2 2.5 300 NA NA
NC 2018 6.1 1.8 350 200 610
NC 2019 5.9 1.6 375 210 590



Reshaping Data

What does this look like in practice?

religion <$10k $10-20k $20-30k $30-40k $40-50k $50-75k $75-100k $100-150k >150k Don't know/refused
Agnostic 27 34 60 81 76 137 122 109 84 96
Atheist 12 27 37 52 35 70 73 59 74 76
Buddhist 27 21 30 34 33 58 62 39 53 54
Catholic 418 617 732 670 638 1116 949 792 633 1489
Don’t know/refused 15 14 15 11 10 35 21 17 18 116
Evangelical Prot 575 869 1064 982 881 1486 949 723 414 1529
Hindu 1 9 7 9 11 34 47 48 54 37
Historically Black Prot 228 244 236 238 197 223 131 81 78 339
Jehovah's Witness 20 27 24 24 21 30 15 11 6 37
Jewish 19 19 25 25 30 95 69 87 151 162

Making Data Longer

relig_income |> 
  pivot_longer(!religion, names_to = "income", values_to = "count" )
religion income count
Agnostic <$10k 27
Agnostic $10-20k 34
Agnostic $20-30k 60
Agnostic $30-40k 81
Agnostic $40-50k 76
Agnostic $50-75k 137
Agnostic $75-100k 122
Agnostic $100-150k 109
Agnostic >150k 84
Agnostic Don't know/refused 96


  • Sometimes we need one variable to be two variables
  • Enter Separate
library(lubridate) # for working with dates 
athlete_data =   tibble(forename = c("Lewis", "Tom", "Michael", "Joshua"),
                      surname = c("Hamilton", "Brady", "Jordan", "Allen"),
                      dob = ymd(c("1985-01-07", "1977-08-03","1963-02-17", "1996-05-21")))
athlete_data |>
separate(dob, c("year", "month", "day"))
# A tibble: 4 × 5
  forename surname  year  month day  
  <chr>    <chr>    <chr> <chr> <chr>
1 Lewis    Hamilton 1985  01    07   
2 Tom      Brady    1977  08    03   
3 Michael  Jordan   1963  02    17   
4 Joshua   Allen    1996  05    21   


  • Other times we need to combine multiple columns to be one column
  • enter unite
athlete_data |>
unite(name, c("forename", "surname"), sep = " ")
# A tibble: 4 × 2
  name           dob       
  <chr>          <date>    
1 Lewis Hamilton 1985-01-07
2 Tom Brady      1977-08-03
3 Michael Jordan 1963-02-17
4 Joshua Allen   1996-05-21

Getting up and Running Using helpers

Helpful Helpers

  • Dplyr comes with really useful functions that help you when there are common patterns in your variable names

  • the syntax usually goes

  • select(contains("pattern"))

  • select(starts_with("pattern"))

  • select(ends_with("pattern"))

  select(starwars, name,
   ends_with("color"),  -eye_color)
name hair_color skin_color
Luke Skywalker blond fair
C-3PO NA gold
R2-D2 NA white, blue
Darth Vader none white
... ... ...

Another Helper Example

penguins |>
bill_length_mm bill_depth_mm
39.1 18.7
39.5 17.4
40.3 18.0
36.7 19.3

filter() with regular expressions

  • regular expressions also work and can be pretty handy.
# base R regex
filter(starwars, grepl("Sky", name)|  # base r version
           grepl("Palp", name) |
           grepl("Obi", name))
# tidyverse regex
filter(starwars, str_detect(name, "Sky") | 
         str_detect(name, "Palp") |
        str_detect(name, "Obi"))
name height mass homeworld
Luke Skywalker 172 77 Tatooine
Obi-Wan Kenobi 182 77 Stewjon
Anakin Skywalker 188 84 Tatooine
Palpatine 170 75 Naboo
Shmi Skywalker 163 NA Tatooine
... ... ... ...


Regular ifelse

  big_penguin = ifelse(
  body_mass_g >= 4750,
   "Dats a big penguin",
   "SMOL penguin"))
# A tibble: 6 × 2
  body_mass_g big_penguin 
        <int> <chr>       
1        3750 SMOL penguin
2        3800 SMOL penguin
3        3250 SMOL penguin
4          NA <NA>        
5        3450 SMOL penguin
6        3650 SMOL penguin

Fancy ifelse

jedi = c("Luke Skywalker",
 "Yoda", "Obi-Wan Kenobi",
  "Mace Windu")

sith = c("Palpatine",
 "Darth Maul",
   "Darth Vader")

hero_villains <- filter(starwars, name %in% jedi |
 name %in% sith)  
  what_are_they = case_when(
  name %in% jedi ~ "Hero",
  name %in% sith ~ "Evil Dooer")) 
# A tibble: 5 × 2
  name           what_are_they
  <chr>          <chr>        
1 Luke Skywalker Hero         
2 Darth Vader    Evil Dooer   
3 Obi-Wan Kenobi Hero         
4 Yoda           Hero         
5 Palpatine      Evil Dooer   

Generating Summary statistics

penguins |> 
select(-year) |>
group_by(species) |>
 c(Mean = mean, Min = min, Max = max), na.rm = TRUE),
  .names = "{.cols}_{.fn}")
# A tibble: 3 × 14
  species   bill_lengt…¹ bill_…² bill_…³ bill_…⁴ bill_…⁵ bill_…⁶ flipp…⁷ flipp…⁸
  <fct>            <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <int>
1 Adelie            38.8    32.1    46      18.3    15.5    21.5    190.     172
2 Chinstrap         48.8    40.9    58      18.4    16.4    20.8    196.     178
3 Gentoo            47.5    40.9    59.6    15.0    13.1    17.3    217.     203
# … with 5 more variables: flipper_length_mm_Max <int>, body_mass_g_Mean <dbl>,
#   body_mass_g_Min <int>, body_mass_g_Max <int>, .names <chr>, and abbreviated
#   variable names ¹​bill_length_mm_Mean, ²​bill_length_mm_Min,
#   ³​bill_length_mm_Max, ⁴​bill_depth_mm_Mean, ⁵​bill_depth_mm_Min,
#   ⁶​bill_depth_mm_Max, ⁷​flipper_length_mm_Mean, ⁸​flipper_length_mm_Min

Pivoting With LOTS of columns

Using Your Helpers

billboard |> 
    cols = starts_with("wk"), 
    names_to = "week",
    names_prefix = "wk",
    values_to = "rank",
    values_drop_na = TRUE
  ) |>
# A tibble: 4 × 5
  artist track                   date.entered week   rank
  <chr>  <chr>                   <date>       <chr> <dbl>
1 2 Pac  Baby Don't Cry (Keep... 2000-02-26   1        87
2 2 Pac  Baby Don't Cry (Keep... 2000-02-26   2        82
3 2 Pac  Baby Don't Cry (Keep... 2000-02-26   3        72
4 2 Pac  Baby Don't Cry (Keep... 2000-02-26   4        77

