Data Cleaning in R

With data.table

Josh Allen

Department of Political Science at Georgia State University

8/29/22

Packages You Will Need

install.packages("pacman")

pacman::p_load("palmerpenguins","data.table", install = TRUE)


data("penguins")

data("starwars", package = "dplyr" )
## Or 
starwars = read_csv("data/starwars.csv")
penguins = read_csv("data/penguins.csv")


## Or 
starwars = fread("data/starwars.csv")
penguins = fread("data/penguins.csv")

data.table pros

  • more concise(as we will see)

  • less dependencies

  • faster(as our data gets bigger)

A comparison of tasks

## tidyverse 
penguins |>
filter(species == "Chinstrap") |>
group_by(island) |>
summarise(mean_weight = mean(body_mass_g, na.rm = TRUE))

## data.table
penguins_dt = as.data.table(penguins)

penguins_dt[species == "Chinstrap",
 mean(body_mass_g, na.rm = TRUE),
  by = island]
# A tibble: 1 × 2
  island mean_weight
  <fct>        <dbl>
1 Dream        3733.
   island       V1
1:  Dream 3733.088

A Statement from Josh

  • I really wanted to learn data.table which is the motivation for this slide show

  • This is my attempt to do it

  • Both packages are awesome

  • I have zero stakes in either direction

  • Both suit different data cleaning tasks well

  • Their are compelling reasons to favor one over the other

  • If one is more intuitive to you that is great use the one you feel more comfortable with

  • Knowing both is awesome

    • I don’t promote evangelism of one over the other

What is the difference?

  • Both work with an “enhanced” version of a data.frame

    • For dplyr this is called a tibble
    • For data.table this is called a data.table
  • The main difference(from what I understand) is that data.table tries to modify by reference as much as possible

  • What does that mean?

  • Think about about what the pipe.

  • new_object <- object |> dplyr_verb(thing we want to do) |> dplyr_verb(next thing we want to do)

  • This relies on a Copy-on-modify approach

  1. Copy-on-modify: Creates a copy of your data. Implies extra computational overhead(slower)
  2. Modify-in-place: Avoids creating copies and simply cahnges the data where it sits in memory(faster)

The Common Syntax of data.table

DT[i, j, by]

  • i = The rows you want to work with

  • j = The columns you want to work with

  • by = a grouping variable

The Dplyr equivalents

  • filter;slice;arrange

  • select, mutate, summarise

  • group_by

Subsetting By Row with i

Using tests

penguins_dt[species %in% c("Chinstrap","Gentoo") | body_mass_g <= 4050]
   species body_mass_g
1:  Adelie        3750
2:  Adelie        3800
3:  Adelie        3250
4:  Adelie        3450
5:  Adelie        3650

Using regular expressions

starwars_dt = as.data.table(starwars)

starwars_dt[grepl("Skywalker", name)]
               name
1:   Luke Skywalker
2: Anakin Skywalker
3:   Shmi Skywalker

Finding Missing Data

penguins_dt[is.na(sex)][]
    species  sex
 1:  Adelie <NA>
 2:  Adelie <NA>
 3:  Adelie <NA>
 4:  Adelie <NA>
 5:  Adelie <NA>
 6:  Adelie <NA>
 7:  Gentoo <NA>
 8:  Gentoo <NA>
 9:  Gentoo <NA>
10:  Gentoo <NA>
11:  Gentoo <NA>

Finiding Non-missing Data

penguins_dt[!is.na(sex)][]
       species    sex
  1:    Adelie   male
  2:    Adelie female
  3:    Adelie female
  4:    Adelie female
  5:    Adelie   male
 ---                 
329: Chinstrap   male
330: Chinstrap female
331: Chinstrap   male
332: Chinstrap   male
333: Chinstrap female

Doing Stuff with j

The basics

  • Like Dplyr we can use column postion or name
penguins_dt[, 1:3]
   species    island bill_length_mm
1:  Adelie Torgersen           39.1
2:  Adelie Torgersen           39.5
penguins_dt[, c("body_mass_g", "species")]
   body_mass_g species
1:        3750  Adelie
2:        3800  Adelie

Excluding Using -

penguins_dt[,-c(1:3)]
   bill_depth_mm flipper_length_mm body_mass_g    sex year
1:          18.7               181        3750   male 2007
2:          17.4               186        3800 female 2007
3:          18.0               195        3250 female 2007
penguins_dt[,-c("body_mass_g","species")]
      island bill_length_mm bill_depth_mm flipper_length_mm    sex year
1: Torgersen           39.1          18.7               181   male 2007
2: Torgersen           39.5          17.4               186 female 2007

. in data.table

  • The . serves as a shorthand for c and list

  • So we can put . in the j postition letting us use less characters

penguins_dt[, .(body_mass_g, island)]
   body_mass_g    island
1:        3750 Torgersen
2:        3800 Torgersen
penguins_dt[, .(body_mass_g, island)]

Regular Expressions

We can still use regular expressions - They just look a little different

starwars_dt[, .SD , .SDcols = patterns("color")][] 
   hair_color  skin_color eye_color
1:      blond        fair      blue
2:       <NA>        gold    yellow
3:       <NA> white, blue       red
4:       none       white    yellow
5:      brown       light     brown

Regular Expressions(cont)

  • What is going on with .SD is that data.table needs a place holder

  • .SD stands for Subset of Data.table

If we tried to do this:

starwars_dt[, .SDcols = patterns("color")]

It would return

Warning in `[.data.table`(starwars_dt, , .SDcols = patterns("color")): i and j
are both missing so ignoring the other arguments. This warning will be upgraded
to error in future.
                     name height   mass    hair_color          skin_color
 1:        Luke Skywalker    172   77.0         blond                fair
 2:                 C-3PO    167   75.0          <NA>                gold
 3:                 R2-D2     96   32.0          <NA>         white, blue
 4:           Darth Vader    202  136.0          none               white
 5:           Leia Organa    150   49.0         brown               light
 6:             Owen Lars    178  120.0   brown, grey               light
 7:    Beru Whitesun lars    165   75.0         brown               light
 8:                 R5-D4     97   32.0          <NA>          white, red
 9:     Biggs Darklighter    183   84.0         black               light
10:        Obi-Wan Kenobi    182   77.0 auburn, white                fair
11:      Anakin Skywalker    188   84.0         blond                fair
12:        Wilhuff Tarkin    180     NA  auburn, grey                fair
13:             Chewbacca    228  112.0         brown             unknown
14:              Han Solo    180   80.0         brown                fair
15:                Greedo    173   74.0          <NA>               green
16: Jabba Desilijic Tiure    175 1358.0          <NA>    green-tan, brown
17:        Wedge Antilles    170   77.0         brown                fair
18:      Jek Tono Porkins    180  110.0         brown                fair
19:                  Yoda     66   17.0         white               green
20:             Palpatine    170   75.0          grey                pale
21:             Boba Fett    183   78.2         black                fair
22:                 IG-88    200  140.0          none               metal
23:                 Bossk    190  113.0          none               green
24:      Lando Calrissian    177   79.0         black                dark
25:                 Lobot    175   79.0          none               light
26:                Ackbar    180   83.0          none        brown mottle
27:            Mon Mothma    150     NA        auburn                fair
28:          Arvel Crynyd     NA     NA         brown                fair
29: Wicket Systri Warrick     88   20.0         brown               brown
30:             Nien Nunb    160   68.0          none                grey
31:          Qui-Gon Jinn    193   89.0         brown                fair
32:           Nute Gunray    191   90.0          none       mottled green
33:         Finis Valorum    170     NA         blond                fair
34:         Jar Jar Binks    196   66.0          none              orange
35:          Roos Tarpals    224   82.0          none                grey
36:            Rugor Nass    206     NA          none               green
37:              Ric Olié    183     NA         brown                fair
38:                 Watto    137     NA         black          blue, grey
39:               Sebulba    112   40.0          none           grey, red
40:         Quarsh Panaka    183     NA         black                dark
41:        Shmi Skywalker    163     NA         black                fair
42:            Darth Maul    175   80.0          none                 red
43:           Bib Fortuna    180     NA          none                pale
44:           Ayla Secura    178   55.0          none                blue
45:              Dud Bolt     94   45.0          none          blue, grey
46:               Gasgano    122     NA          none         white, blue
47:        Ben Quadinaros    163   65.0          none grey, green, yellow
48:            Mace Windu    188   84.0          none                dark
49:          Ki-Adi-Mundi    198   82.0         white                pale
50:             Kit Fisto    196   87.0          none               green
51:             Eeth Koth    171     NA         black               brown
52:            Adi Gallia    184   50.0          none                dark
53:           Saesee Tiin    188     NA          none                pale
54:           Yarael Poof    264     NA          none               white
55:              Plo Koon    188   80.0          none              orange
56:            Mas Amedda    196     NA          none                blue
57:          Gregar Typho    185   85.0         black                dark
58:                 Cordé    157     NA         brown               light
59:           Cliegg Lars    183     NA         brown                fair
60:     Poggle the Lesser    183   80.0          none               green
61:       Luminara Unduli    170   56.2         black              yellow
62:         Barriss Offee    166   50.0         black              yellow
63:                 Dormé    165     NA         brown               light
64:                 Dooku    193   80.0         white                fair
65:   Bail Prestor Organa    191     NA         black                 tan
66:            Jango Fett    183   79.0         black                 tan
67:            Zam Wesell    168   55.0        blonde fair, green, yellow
68:       Dexter Jettster    198  102.0          none               brown
69:               Lama Su    229   88.0          none                grey
70:               Taun We    213     NA          none                grey
71:            Jocasta Nu    167     NA         white                fair
72:         Ratts Tyerell     79   15.0          none          grey, blue
73:                R4-P17     96     NA          none         silver, red
74:            Wat Tambor    193   48.0          none         green, grey
75:              San Hill    191     NA          none                grey
76:              Shaak Ti    178   57.0          none    red, blue, white
77:              Grievous    216  159.0          none        brown, white
78:               Tarfful    234  136.0         brown               brown
79:       Raymus Antilles    188   79.0         brown               light
80:             Sly Moore    178   48.0          none                pale
81:            Tion Medon    206   80.0          none                grey
82:                  Finn     NA     NA         black                dark
83:                   Rey     NA     NA         brown               light
84:           Poe Dameron     NA     NA         brown               light
85:                   BB8     NA     NA          none                none
86:        Captain Phasma     NA     NA       unknown             unknown
87:         Padmé Amidala    165   45.0         brown               light
                     name height   mass    hair_color          skin_color
        eye_color birth_year            sex    gender      homeworld
 1:          blue       19.0           male masculine       Tatooine
 2:        yellow      112.0           none masculine       Tatooine
 3:           red       33.0           none masculine          Naboo
 4:        yellow       41.9           male masculine       Tatooine
 5:         brown       19.0         female  feminine       Alderaan
 6:          blue       52.0           male masculine       Tatooine
 7:          blue       47.0         female  feminine       Tatooine
 8:           red         NA           none masculine       Tatooine
 9:         brown       24.0           male masculine       Tatooine
10:     blue-gray       57.0           male masculine        Stewjon
11:          blue       41.9           male masculine       Tatooine
12:          blue       64.0           male masculine         Eriadu
13:          blue      200.0           male masculine       Kashyyyk
14:         brown       29.0           male masculine       Corellia
15:         black       44.0           male masculine          Rodia
16:        orange      600.0 hermaphroditic masculine      Nal Hutta
17:         hazel       21.0           male masculine       Corellia
18:          blue         NA           male masculine     Bestine IV
19:         brown      896.0           male masculine           <NA>
20:        yellow       82.0           male masculine          Naboo
21:         brown       31.5           male masculine         Kamino
22:           red       15.0           none masculine           <NA>
23:           red       53.0           male masculine      Trandosha
24:         brown       31.0           male masculine        Socorro
25:          blue       37.0           male masculine         Bespin
26:        orange       41.0           male masculine       Mon Cala
27:          blue       48.0         female  feminine      Chandrila
28:         brown         NA           male masculine           <NA>
29:         brown        8.0           male masculine          Endor
30:         black         NA           male masculine        Sullust
31:          blue       92.0           male masculine           <NA>
32:           red         NA           male masculine Cato Neimoidia
33:          blue       91.0           male masculine      Coruscant
34:        orange       52.0           male masculine          Naboo
35:        orange         NA           male masculine          Naboo
36:        orange         NA           male masculine          Naboo
37:          blue         NA           <NA>      <NA>          Naboo
38:        yellow         NA           male masculine       Toydaria
39:        orange         NA           male masculine      Malastare
40:         brown       62.0           <NA>      <NA>          Naboo
41:         brown       72.0         female  feminine       Tatooine
42:        yellow       54.0           male masculine       Dathomir
43:          pink         NA           male masculine         Ryloth
44:         hazel       48.0         female  feminine         Ryloth
45:        yellow         NA           male masculine        Vulpter
46:         black         NA           male masculine        Troiken
47:        orange         NA           male masculine           Tund
48:         brown       72.0           male masculine     Haruun Kal
49:        yellow       92.0           male masculine          Cerea
50:         black         NA           male masculine    Glee Anselm
51:         brown         NA           male masculine       Iridonia
52:          blue         NA         female  feminine      Coruscant
53:        orange         NA           male masculine        Iktotch
54:        yellow         NA           male masculine        Quermia
55:         black       22.0           male masculine          Dorin
56:          blue         NA           male masculine       Champala
57:         brown         NA           male masculine          Naboo
58:         brown         NA         female  feminine          Naboo
59:          blue       82.0           male masculine       Tatooine
60:        yellow         NA           male masculine       Geonosis
61:          blue       58.0         female  feminine         Mirial
62:          blue       40.0         female  feminine         Mirial
63:         brown         NA         female  feminine          Naboo
64:         brown      102.0           male masculine        Serenno
65:         brown       67.0           male masculine       Alderaan
66:         brown       66.0           male masculine   Concord Dawn
67:        yellow         NA         female  feminine          Zolan
68:        yellow         NA           male masculine           Ojom
69:         black         NA           male masculine         Kamino
70:         black         NA         female  feminine         Kamino
71:          blue         NA         female  feminine      Coruscant
72:       unknown         NA           male masculine    Aleen Minor
73:     red, blue         NA           none  feminine           <NA>
74:       unknown         NA           male masculine          Skako
75:          gold         NA           male masculine     Muunilinst
76:         black         NA         female  feminine          Shili
77: green, yellow         NA           male masculine          Kalee
78:          blue         NA           male masculine       Kashyyyk
79:         brown         NA           male masculine       Alderaan
80:         white         NA           <NA>      <NA>         Umbara
81:         black         NA           male masculine         Utapau
82:          dark         NA           male masculine           <NA>
83:         hazel         NA         female  feminine           <NA>
84:         brown         NA           male masculine           <NA>
85:         black         NA           none masculine           <NA>
86:       unknown         NA           <NA>      <NA>           <NA>
87:         brown       46.0         female  feminine          Naboo
        eye_color birth_year            sex    gender      homeworld
           species
 1:          Human
 2:          Droid
 3:          Droid
 4:          Human
 5:          Human
 6:          Human
 7:          Human
 8:          Droid
 9:          Human
10:          Human
11:          Human
12:          Human
13:        Wookiee
14:          Human
15:         Rodian
16:           Hutt
17:          Human
18:          Human
19: Yoda's species
20:          Human
21:          Human
22:          Droid
23:     Trandoshan
24:          Human
25:          Human
26:   Mon Calamari
27:          Human
28:          Human
29:           Ewok
30:      Sullustan
31:          Human
32:      Neimodian
33:          Human
34:         Gungan
35:         Gungan
36:         Gungan
37:           <NA>
38:      Toydarian
39:            Dug
40:           <NA>
41:          Human
42:         Zabrak
43:        Twi'lek
44:        Twi'lek
45:     Vulptereen
46:          Xexto
47:          Toong
48:          Human
49:         Cerean
50:       Nautolan
51:         Zabrak
52:     Tholothian
53:       Iktotchi
54:       Quermian
55:        Kel Dor
56:       Chagrian
57:          Human
58:          Human
59:          Human
60:      Geonosian
61:       Mirialan
62:       Mirialan
63:          Human
64:          Human
65:          Human
66:          Human
67:       Clawdite
68:       Besalisk
69:       Kaminoan
70:       Kaminoan
71:          Human
72:         Aleena
73:          Droid
74:        Skakoan
75:           Muun
76:        Togruta
77:        Kaleesh
78:        Wookiee
79:          Human
80:           <NA>
81:         Pau'an
82:          Human
83:          Human
84:          Human
85:          Droid
86:           <NA>
87:          Human
           species
                                                                                                                    films
 1:                           The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
 2:     The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope
 3: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope,...
 4:                                             The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope
 5:                           The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
 6:                                                                   Attack of the Clones,Revenge of the Sith,A New Hope
 7:                                                                   Attack of the Clones,Revenge of the Sith,A New Hope
 8:                                                                                                            A New Hope
 9:                                                                                                            A New Hope
10:     The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope
11:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
12:                                                                                        Revenge of the Sith,A New Hope
13:                           The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
14:                                               The Empire Strikes Back,Return of the Jedi,A New Hope,The Force Awakens
15:                                                                                                            A New Hope
16:                                                                      The Phantom Menace,Return of the Jedi,A New Hope
17:                                                                 The Empire Strikes Back,Return of the Jedi,A New Hope
18:                                                                                                            A New Hope
19:                The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi
20:                The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi
21:                                                       The Empire Strikes Back,Attack of the Clones,Return of the Jedi
22:                                                                                               The Empire Strikes Back
23:                                                                                               The Empire Strikes Back
24:                                                                            The Empire Strikes Back,Return of the Jedi
25:                                                                                               The Empire Strikes Back
26:                                                                                  Return of the Jedi,The Force Awakens
27:                                                                                                    Return of the Jedi
28:                                                                                                    Return of the Jedi
29:                                                                                                    Return of the Jedi
30:                                                                                                    Return of the Jedi
31:                                                                                                    The Phantom Menace
32:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
33:                                                                                                    The Phantom Menace
34:                                                                               Attack of the Clones,The Phantom Menace
35:                                                                                                    The Phantom Menace
36:                                                                                                    The Phantom Menace
37:                                                                                                    The Phantom Menace
38:                                                                               Attack of the Clones,The Phantom Menace
39:                                                                                                    The Phantom Menace
40:                                                                                                    The Phantom Menace
41:                                                                               Attack of the Clones,The Phantom Menace
42:                                                                                                    The Phantom Menace
43:                                                                                                    Return of the Jedi
44:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
45:                                                                                                    The Phantom Menace
46:                                                                                                    The Phantom Menace
47:                                                                                                    The Phantom Menace
48:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
49:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
50:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
51:                                                                                The Phantom Menace,Revenge of the Sith
52:                                                                                The Phantom Menace,Revenge of the Sith
53:                                                                                The Phantom Menace,Revenge of the Sith
54:                                                                                                    The Phantom Menace
55:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
56:                                                                               Attack of the Clones,The Phantom Menace
57:                                                                                                  Attack of the Clones
58:                                                                                                  Attack of the Clones
59:                                                                                                  Attack of the Clones
60:                                                                              Attack of the Clones,Revenge of the Sith
61:                                                                              Attack of the Clones,Revenge of the Sith
62:                                                                                                  Attack of the Clones
63:                                                                                                  Attack of the Clones
64:                                                                              Attack of the Clones,Revenge of the Sith
65:                                                                              Attack of the Clones,Revenge of the Sith
66:                                                                                                  Attack of the Clones
67:                                                                                                  Attack of the Clones
68:                                                                                                  Attack of the Clones
69:                                                                                                  Attack of the Clones
70:                                                                                                  Attack of the Clones
71:                                                                                                  Attack of the Clones
72:                                                                                                    The Phantom Menace
73:                                                                              Attack of the Clones,Revenge of the Sith
74:                                                                                                  Attack of the Clones
75:                                                                                                  Attack of the Clones
76:                                                                              Attack of the Clones,Revenge of the Sith
77:                                                                                                   Revenge of the Sith
78:                                                                                                   Revenge of the Sith
79:                                                                                        Revenge of the Sith,A New Hope
80:                                                                              Attack of the Clones,Revenge of the Sith
81:                                                                                                   Revenge of the Sith
82:                                                                                                     The Force Awakens
83:                                                                                                     The Force Awakens
84:                                                                                                     The Force Awakens
85:                                                                                                     The Force Awakens
86:                                                                                                     The Force Awakens
87:                                                           Attack of the Clones,The Phantom Menace,Revenge of the Sith
                                                                                                                    films
                               vehicles
 1:   Snowspeeder,Imperial Speeder Bike
 2:                                    
 3:                                    
 4:                                    
 5:               Imperial Speeder Bike
 6:                                    
 7:                                    
 8:                                    
 9:                                    
10:                     Tribubble bongo
11: Zephyr-G swoop bike,XJ-6 airspeeder
12:                                    
13:                               AT-ST
14:                                    
15:                                    
16:                                    
17:                         Snowspeeder
18:                                    
19:                                    
20:                                    
21:                                    
22:                                    
23:                                    
24:                                    
25:                                    
26:                                    
27:                                    
28:                                    
29:                                    
30:                                    
31:                     Tribubble bongo
32:                                    
33:                                    
34:                                    
35:                                    
36:                                    
37:                                    
38:                                    
39:                                    
40:                                    
41:                                    
42:                        Sith speeder
43:                                    
44:                                    
45:                                    
46:                                    
47:                                    
48:                                    
49:                                    
50:                                    
51:                                    
52:                                    
53:                                    
54:                                    
55:                                    
56:                                    
57:                                    
58:                                    
59:                                    
60:                                    
61:                                    
62:                                    
63:                                    
64:                    Flitknot speeder
65:                                    
66:                                    
67:          Koro-2 Exodrive airspeeder
68:                                    
69:                                    
70:                                    
71:                                    
72:                                    
73:                                    
74:                                    
75:                                    
76:                                    
77:         Tsmeu-6 personal wheel bike
78:                                    
79:                                    
80:                                    
81:                                    
82:                                    
83:                                    
84:                                    
85:                                    
86:                                    
87:                                    
                               vehicles
                                                                                               starships
 1:                                                                              X-wing,Imperial shuttle
 2:                                                                                                     
 3:                                                                                                     
 4:                                                                                      TIE Advanced x1
 5:                                                                                                     
 6:                                                                                                     
 7:                                                                                                     
 8:                                                                                                     
 9:                                                                                               X-wing
10: Jedi starfighter,Trade Federation cruiser,Naboo star skiff,Jedi Interceptor,Belbullab-22 starfighter
11:                                              Trade Federation cruiser,Jedi Interceptor,Naboo fighter
12:                                                                                                     
13:                                                                   Millennium Falcon,Imperial shuttle
14:                                                                   Millennium Falcon,Imperial shuttle
15:                                                                                                     
16:                                                                                                     
17:                                                                                               X-wing
18:                                                                                               X-wing
19:                                                                                                     
20:                                                                                                     
21:                                                                                              Slave 1
22:                                                                                                     
23:                                                                                                     
24:                                                                                    Millennium Falcon
25:                                                                                                     
26:                                                                                                     
27:                                                                                                     
28:                                                                                               A-wing
29:                                                                                                     
30:                                                                                    Millennium Falcon
31:                                                                                                     
32:                                                                                                     
33:                                                                                                     
34:                                                                                                     
35:                                                                                                     
36:                                                                                                     
37:                                                                                 Naboo Royal Starship
38:                                                                                                     
39:                                                                                                     
40:                                                                                                     
41:                                                                                                     
42:                                                                                             Scimitar
43:                                                                                                     
44:                                                                                                     
45:                                                                                                     
46:                                                                                                     
47:                                                                                                     
48:                                                                                                     
49:                                                                                                     
50:                                                                                                     
51:                                                                                                     
52:                                                                                                     
53:                                                                                                     
54:                                                                                                     
55:                                                                                     Jedi starfighter
56:                                                                                                     
57:                                                                                        Naboo fighter
58:                                                                                                     
59:                                                                                                     
60:                                                                                                     
61:                                                                                                     
62:                                                                                                     
63:                                                                                                     
64:                                                                                                     
65:                                                                                                     
66:                                                                                                     
67:                                                                                                     
68:                                                                                                     
69:                                                                                                     
70:                                                                                                     
71:                                                                                                     
72:                                                                                                     
73:                                                                                                     
74:                                                                                                     
75:                                                                                                     
76:                                                                                                     
77:                                                                             Belbullab-22 starfighter
78:                                                                                                     
79:                                                                                                     
80:                                                                                                     
81:                                                                                                     
82:                                                                                                     
83:                                                                                                     
84:                                                                                  T-70 X-wing fighter
85:                                                                                                     
86:                                                                                                     
87:                                                   H-type Nubian yacht,Naboo star skiff,Naboo fighter
                                                                                               starships

Creating New Variables

One Cool thing about data.table is that you do not need to reassign the object once you modify it

This creates a new variable

penguins_dt[, body_mass_g_sq := body_mass_g^2]

This will just square body_mass_g in place

You will not see these changes unless you ask data.table to show you with an additional set of []

starwars_dt[,mass := mass^2][]
             name height mass hair_color  skin_color eye_color birth_year  sex
1: Luke Skywalker    172 5929      blond        fair      blue         19 male
2:          C-3PO    167 5625       <NA>        gold    yellow        112 none
3:          R2-D2     96 1024       <NA> white, blue       red         33 none
      gender homeworld species
1: masculine  Tatooine   Human
2: masculine  Tatooine   Droid
3: masculine     Naboo   Droid
                                                                                                                   films
1:                           The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
2:     The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope
3: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope,...
                            vehicles               starships
1: Snowspeeder,Imperial Speeder Bike X-wing,Imperial shuttle
2:                                                          
3:                                                          

Multiple Operations

  • As we saw with dplyr we can do this
penguins |>
mutate(square_mass = body_mass_g^2,
       big_penguin = ifelse(square_mass >= median(square_mass, na.rm = TRUE) , TRUE, FALSE)) |>
       select(species, square_mass, big_penguin) |>
       head(5)
# A tibble: 5 × 3
  species square_mass big_penguin
  <fct>         <dbl> <lgl>      
1 Adelie     14062500 FALSE      
2 Adelie     14440000 FALSE      
3 Adelie     10562500 FALSE      
4 Adelie           NA NA         
5 Adelie     11902500 FALSE      
  • It would be nice if we can do this with data.table!

Multiple operations

  • Good news. We can! But there are some caveats

  • We can do this by chaining [] together

  • Or using the magrittr pipe

  • We cannot do this by doing lots of stuff in our J column

Multiple Operations with []

penguins_dt[, square_mass := body_mass_g^2][, big_peng := ifelse(square_mass >= median(square_mass, na.rm = TRUE) , TRUE, FALSE)]
   species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1:  Adelie Torgersen           39.1          18.7               181        3750
2:  Adelie Torgersen           39.5          17.4               186        3800
3:  Adelie Torgersen           40.3          18.0               195        3250
4:  Adelie Torgersen             NA            NA                NA          NA
5:  Adelie Torgersen           36.7          19.3               193        3450
      sex year body_mass_g_sq square_mass big_peng
1:   male 2007       14062500    14062500    FALSE
2: female 2007       14440000    14440000    FALSE
3: female 2007       10562500    10562500    FALSE
4:   <NA> 2007             NA          NA       NA
5: female 2007       11902500    11902500    FALSE

Multiple Operations with the pipe

penguins_dt[, square_mass := body_mass_g^2] %>%
.[, big_peng := ifelse(square_mass >= median(square_mass, na.rm = TRUE) , TRUE, FALSE)]
   species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1:  Adelie Torgersen           39.1          18.7               181        3750
2:  Adelie Torgersen           39.5          17.4               186        3800
3:  Adelie Torgersen           40.3          18.0               195        3250
4:  Adelie Torgersen             NA            NA                NA          NA
5:  Adelie Torgersen           36.7          19.3               193        3450
6:  Adelie Torgersen           39.3          20.6               190        3650
      sex year body_mass_g_sq square_mass big_peng
1:   male 2007       14062500    14062500    FALSE
2: female 2007       14440000    14440000    FALSE
3: female 2007       10562500    10562500    FALSE
4:   <NA> 2007             NA          NA       NA
5: female 2007       11902500    11902500    FALSE
6:   male 2007       13322500    13322500    FALSE

By slot

Our last slot is by

  • This works broadly the same as our group_by argument
penguins_dt[, mean(bill_length_mm, na.rm = TRUE), by = species]
species V1
Adelie 38.79139
Gentoo 47.50488
Chinstrap 48.83382

Summarising across multiple columns

  • In dplyr this is a bit more intuitive
penguins |>
group_by(species) |>
summarise(across(where(is.numeric),
 c(Mean = mean, Min = min, Max = max ),
  na.rm = TRUE,
 .names = "{.col}_{.fn}"))
species bill_length_mm_Mean bill_length_mm_Min bill_length_mm_Max bill_depth_mm_Mean bill_depth_mm_Min bill_depth_mm_Max flipper_length_mm_Mean flipper_length_mm_Min flipper_length_mm_Max body_mass_g_Mean body_mass_g_Min body_mass_g_Max year_Mean year_Min year_Max
Adelie 38.79139 32.1 46.0 18.34636 15.5 21.5 189.9536 172 210 3700.662 2850 4775 2008.013 2007 2009
Chinstrap 48.83382 40.9 58.0 18.42059 16.4 20.8 195.8235 178 212 3733.088 2700 4800 2007.971 2007 2009
Gentoo 47.50488 40.9 59.6 14.98211 13.1 17.3 217.1870 203 231 5076.016 3950 6300 2008.081 2007 2009

Summarising Multiple Columns

penguins_dt = as.data.table(penguins)

summary_fun = function(x) list(Mean = mean(x, na.rm = TRUE),
             Min = min(x, na.rm = TRUE),
              Max = max(x, na.rm = TRUE))

penguins_dt[, lapply(.SD, summary_fun), .SDcols = is.numeric, by = species]
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
Adelie 38.79139 18.34636 189.9536 3700.662 2008.013
Adelie 32.1 15.5 172 2850 2007
Adelie 46 21.5 210 4775 2009
Gentoo 47.50488 14.98211 217.187 5076.016 2008.081
Gentoo 40.9 13.1 203 3950 2007
Gentoo 59.6 17.3 231 6300 2009
Chinstrap 48.83382 18.42059 195.8235 3733.088 2007.971
Chinstrap 40.9 16.4 178 2700 2007
Chinstrap 58 20.8 212 4800 2009