With data.table
Department of Political Science at Georgia State University
8/29/22
install.packages("pacman")
pacman::p_load("palmerpenguins","data.table", install = TRUE)
data("penguins")
data("starwars", package = "dplyr" )
## Or
starwars = read_csv("data/starwars.csv")
penguins = read_csv("data/penguins.csv")
## Or
starwars = fread("data/starwars.csv")
penguins = fread("data/penguins.csv")
more concise(as we will see)
less dependencies
faster(as our data gets bigger)
# A tibble: 1 × 2
island mean_weight
<fct> <dbl>
1 Dream 3733.
island V1
1: Dream 3733.088
I really wanted to learn data.table which is the motivation for this slide show
This is my attempt to do it
Both packages are awesome
I have zero stakes in either direction
Both suit different data cleaning tasks well
Their are compelling reasons to favor one over the other
If one is more intuitive to you that is great use the one you feel more comfortable with
Knowing both is awesome
Both work with an “enhanced” version of a data.frame
The main difference(from what I understand) is that data.table tries to modify by reference as much as possible
What does that mean?
Think about about what the pipe.
new_object <- object |> dplyr_verb(thing we want to do) |> dplyr_verb(next thing we want to do)
This relies on a Copy-on-modify approach
DT[i, j, by]
i = The rows you want to work with
j = The columns you want to work with
by = a grouping variable
filter;slice;arrange
select, mutate, summarise
group_by
i
species body_mass_g
1: Adelie 3750
2: Adelie 3800
3: Adelie 3250
4: Adelie 3450
5: Adelie 3650
name
1: Luke Skywalker
2: Anakin Skywalker
3: Shmi Skywalker
species sex
1: Adelie <NA>
2: Adelie <NA>
3: Adelie <NA>
4: Adelie <NA>
5: Adelie <NA>
6: Adelie <NA>
7: Gentoo <NA>
8: Gentoo <NA>
9: Gentoo <NA>
10: Gentoo <NA>
11: Gentoo <NA>
species sex
1: Adelie male
2: Adelie female
3: Adelie female
4: Adelie female
5: Adelie male
---
329: Chinstrap male
330: Chinstrap female
331: Chinstrap male
332: Chinstrap male
333: Chinstrap female
j
-
bill_depth_mm flipper_length_mm body_mass_g sex year
1: 18.7 181 3750 male 2007
2: 17.4 186 3800 female 2007
3: 18.0 195 3250 female 2007
island bill_length_mm bill_depth_mm flipper_length_mm sex year
1: Torgersen 39.1 18.7 181 male 2007
2: Torgersen 39.5 17.4 186 female 2007
.
in data.tableThe .
serves as a shorthand for c
and list
So we can put .
in the j
postition letting us use less characters
body_mass_g island
1: 3750 Torgersen
2: 3800 Torgersen
We can still use regular expressions - They just look a little different
hair_color skin_color eye_color
1: blond fair blue
2: <NA> gold yellow
3: <NA> white, blue red
4: none white yellow
5: brown light brown
What is going on with .SD
is that data.table needs a place holder
.SD stands for S
ubset of D
ata.table
If we tried to do this:
It would return
Warning in `[.data.table`(starwars_dt, , .SDcols = patterns("color")): i and j
are both missing so ignoring the other arguments. This warning will be upgraded
to error in future.
name height mass hair_color skin_color
1: Luke Skywalker 172 77.0 blond fair
2: C-3PO 167 75.0 <NA> gold
3: R2-D2 96 32.0 <NA> white, blue
4: Darth Vader 202 136.0 none white
5: Leia Organa 150 49.0 brown light
6: Owen Lars 178 120.0 brown, grey light
7: Beru Whitesun lars 165 75.0 brown light
8: R5-D4 97 32.0 <NA> white, red
9: Biggs Darklighter 183 84.0 black light
10: Obi-Wan Kenobi 182 77.0 auburn, white fair
11: Anakin Skywalker 188 84.0 blond fair
12: Wilhuff Tarkin 180 NA auburn, grey fair
13: Chewbacca 228 112.0 brown unknown
14: Han Solo 180 80.0 brown fair
15: Greedo 173 74.0 <NA> green
16: Jabba Desilijic Tiure 175 1358.0 <NA> green-tan, brown
17: Wedge Antilles 170 77.0 brown fair
18: Jek Tono Porkins 180 110.0 brown fair
19: Yoda 66 17.0 white green
20: Palpatine 170 75.0 grey pale
21: Boba Fett 183 78.2 black fair
22: IG-88 200 140.0 none metal
23: Bossk 190 113.0 none green
24: Lando Calrissian 177 79.0 black dark
25: Lobot 175 79.0 none light
26: Ackbar 180 83.0 none brown mottle
27: Mon Mothma 150 NA auburn fair
28: Arvel Crynyd NA NA brown fair
29: Wicket Systri Warrick 88 20.0 brown brown
30: Nien Nunb 160 68.0 none grey
31: Qui-Gon Jinn 193 89.0 brown fair
32: Nute Gunray 191 90.0 none mottled green
33: Finis Valorum 170 NA blond fair
34: Jar Jar Binks 196 66.0 none orange
35: Roos Tarpals 224 82.0 none grey
36: Rugor Nass 206 NA none green
37: Ric Olié 183 NA brown fair
38: Watto 137 NA black blue, grey
39: Sebulba 112 40.0 none grey, red
40: Quarsh Panaka 183 NA black dark
41: Shmi Skywalker 163 NA black fair
42: Darth Maul 175 80.0 none red
43: Bib Fortuna 180 NA none pale
44: Ayla Secura 178 55.0 none blue
45: Dud Bolt 94 45.0 none blue, grey
46: Gasgano 122 NA none white, blue
47: Ben Quadinaros 163 65.0 none grey, green, yellow
48: Mace Windu 188 84.0 none dark
49: Ki-Adi-Mundi 198 82.0 white pale
50: Kit Fisto 196 87.0 none green
51: Eeth Koth 171 NA black brown
52: Adi Gallia 184 50.0 none dark
53: Saesee Tiin 188 NA none pale
54: Yarael Poof 264 NA none white
55: Plo Koon 188 80.0 none orange
56: Mas Amedda 196 NA none blue
57: Gregar Typho 185 85.0 black dark
58: Cordé 157 NA brown light
59: Cliegg Lars 183 NA brown fair
60: Poggle the Lesser 183 80.0 none green
61: Luminara Unduli 170 56.2 black yellow
62: Barriss Offee 166 50.0 black yellow
63: Dormé 165 NA brown light
64: Dooku 193 80.0 white fair
65: Bail Prestor Organa 191 NA black tan
66: Jango Fett 183 79.0 black tan
67: Zam Wesell 168 55.0 blonde fair, green, yellow
68: Dexter Jettster 198 102.0 none brown
69: Lama Su 229 88.0 none grey
70: Taun We 213 NA none grey
71: Jocasta Nu 167 NA white fair
72: Ratts Tyerell 79 15.0 none grey, blue
73: R4-P17 96 NA none silver, red
74: Wat Tambor 193 48.0 none green, grey
75: San Hill 191 NA none grey
76: Shaak Ti 178 57.0 none red, blue, white
77: Grievous 216 159.0 none brown, white
78: Tarfful 234 136.0 brown brown
79: Raymus Antilles 188 79.0 brown light
80: Sly Moore 178 48.0 none pale
81: Tion Medon 206 80.0 none grey
82: Finn NA NA black dark
83: Rey NA NA brown light
84: Poe Dameron NA NA brown light
85: BB8 NA NA none none
86: Captain Phasma NA NA unknown unknown
87: Padmé Amidala 165 45.0 brown light
name height mass hair_color skin_color
eye_color birth_year sex gender homeworld
1: blue 19.0 male masculine Tatooine
2: yellow 112.0 none masculine Tatooine
3: red 33.0 none masculine Naboo
4: yellow 41.9 male masculine Tatooine
5: brown 19.0 female feminine Alderaan
6: blue 52.0 male masculine Tatooine
7: blue 47.0 female feminine Tatooine
8: red NA none masculine Tatooine
9: brown 24.0 male masculine Tatooine
10: blue-gray 57.0 male masculine Stewjon
11: blue 41.9 male masculine Tatooine
12: blue 64.0 male masculine Eriadu
13: blue 200.0 male masculine Kashyyyk
14: brown 29.0 male masculine Corellia
15: black 44.0 male masculine Rodia
16: orange 600.0 hermaphroditic masculine Nal Hutta
17: hazel 21.0 male masculine Corellia
18: blue NA male masculine Bestine IV
19: brown 896.0 male masculine <NA>
20: yellow 82.0 male masculine Naboo
21: brown 31.5 male masculine Kamino
22: red 15.0 none masculine <NA>
23: red 53.0 male masculine Trandosha
24: brown 31.0 male masculine Socorro
25: blue 37.0 male masculine Bespin
26: orange 41.0 male masculine Mon Cala
27: blue 48.0 female feminine Chandrila
28: brown NA male masculine <NA>
29: brown 8.0 male masculine Endor
30: black NA male masculine Sullust
31: blue 92.0 male masculine <NA>
32: red NA male masculine Cato Neimoidia
33: blue 91.0 male masculine Coruscant
34: orange 52.0 male masculine Naboo
35: orange NA male masculine Naboo
36: orange NA male masculine Naboo
37: blue NA <NA> <NA> Naboo
38: yellow NA male masculine Toydaria
39: orange NA male masculine Malastare
40: brown 62.0 <NA> <NA> Naboo
41: brown 72.0 female feminine Tatooine
42: yellow 54.0 male masculine Dathomir
43: pink NA male masculine Ryloth
44: hazel 48.0 female feminine Ryloth
45: yellow NA male masculine Vulpter
46: black NA male masculine Troiken
47: orange NA male masculine Tund
48: brown 72.0 male masculine Haruun Kal
49: yellow 92.0 male masculine Cerea
50: black NA male masculine Glee Anselm
51: brown NA male masculine Iridonia
52: blue NA female feminine Coruscant
53: orange NA male masculine Iktotch
54: yellow NA male masculine Quermia
55: black 22.0 male masculine Dorin
56: blue NA male masculine Champala
57: brown NA male masculine Naboo
58: brown NA female feminine Naboo
59: blue 82.0 male masculine Tatooine
60: yellow NA male masculine Geonosis
61: blue 58.0 female feminine Mirial
62: blue 40.0 female feminine Mirial
63: brown NA female feminine Naboo
64: brown 102.0 male masculine Serenno
65: brown 67.0 male masculine Alderaan
66: brown 66.0 male masculine Concord Dawn
67: yellow NA female feminine Zolan
68: yellow NA male masculine Ojom
69: black NA male masculine Kamino
70: black NA female feminine Kamino
71: blue NA female feminine Coruscant
72: unknown NA male masculine Aleen Minor
73: red, blue NA none feminine <NA>
74: unknown NA male masculine Skako
75: gold NA male masculine Muunilinst
76: black NA female feminine Shili
77: green, yellow NA male masculine Kalee
78: blue NA male masculine Kashyyyk
79: brown NA male masculine Alderaan
80: white NA <NA> <NA> Umbara
81: black NA male masculine Utapau
82: dark NA male masculine <NA>
83: hazel NA female feminine <NA>
84: brown NA male masculine <NA>
85: black NA none masculine <NA>
86: unknown NA <NA> <NA> <NA>
87: brown 46.0 female feminine Naboo
eye_color birth_year sex gender homeworld
species
1: Human
2: Droid
3: Droid
4: Human
5: Human
6: Human
7: Human
8: Droid
9: Human
10: Human
11: Human
12: Human
13: Wookiee
14: Human
15: Rodian
16: Hutt
17: Human
18: Human
19: Yoda's species
20: Human
21: Human
22: Droid
23: Trandoshan
24: Human
25: Human
26: Mon Calamari
27: Human
28: Human
29: Ewok
30: Sullustan
31: Human
32: Neimodian
33: Human
34: Gungan
35: Gungan
36: Gungan
37: <NA>
38: Toydarian
39: Dug
40: <NA>
41: Human
42: Zabrak
43: Twi'lek
44: Twi'lek
45: Vulptereen
46: Xexto
47: Toong
48: Human
49: Cerean
50: Nautolan
51: Zabrak
52: Tholothian
53: Iktotchi
54: Quermian
55: Kel Dor
56: Chagrian
57: Human
58: Human
59: Human
60: Geonosian
61: Mirialan
62: Mirialan
63: Human
64: Human
65: Human
66: Human
67: Clawdite
68: Besalisk
69: Kaminoan
70: Kaminoan
71: Human
72: Aleena
73: Droid
74: Skakoan
75: Muun
76: Togruta
77: Kaleesh
78: Wookiee
79: Human
80: <NA>
81: Pau'an
82: Human
83: Human
84: Human
85: Droid
86: <NA>
87: Human
species
films
1: The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
2: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope
3: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope,...
4: The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope
5: The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
6: Attack of the Clones,Revenge of the Sith,A New Hope
7: Attack of the Clones,Revenge of the Sith,A New Hope
8: A New Hope
9: A New Hope
10: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope
11: Attack of the Clones,The Phantom Menace,Revenge of the Sith
12: Revenge of the Sith,A New Hope
13: The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
14: The Empire Strikes Back,Return of the Jedi,A New Hope,The Force Awakens
15: A New Hope
16: The Phantom Menace,Return of the Jedi,A New Hope
17: The Empire Strikes Back,Return of the Jedi,A New Hope
18: A New Hope
19: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi
20: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi
21: The Empire Strikes Back,Attack of the Clones,Return of the Jedi
22: The Empire Strikes Back
23: The Empire Strikes Back
24: The Empire Strikes Back,Return of the Jedi
25: The Empire Strikes Back
26: Return of the Jedi,The Force Awakens
27: Return of the Jedi
28: Return of the Jedi
29: Return of the Jedi
30: Return of the Jedi
31: The Phantom Menace
32: Attack of the Clones,The Phantom Menace,Revenge of the Sith
33: The Phantom Menace
34: Attack of the Clones,The Phantom Menace
35: The Phantom Menace
36: The Phantom Menace
37: The Phantom Menace
38: Attack of the Clones,The Phantom Menace
39: The Phantom Menace
40: The Phantom Menace
41: Attack of the Clones,The Phantom Menace
42: The Phantom Menace
43: Return of the Jedi
44: Attack of the Clones,The Phantom Menace,Revenge of the Sith
45: The Phantom Menace
46: The Phantom Menace
47: The Phantom Menace
48: Attack of the Clones,The Phantom Menace,Revenge of the Sith
49: Attack of the Clones,The Phantom Menace,Revenge of the Sith
50: Attack of the Clones,The Phantom Menace,Revenge of the Sith
51: The Phantom Menace,Revenge of the Sith
52: The Phantom Menace,Revenge of the Sith
53: The Phantom Menace,Revenge of the Sith
54: The Phantom Menace
55: Attack of the Clones,The Phantom Menace,Revenge of the Sith
56: Attack of the Clones,The Phantom Menace
57: Attack of the Clones
58: Attack of the Clones
59: Attack of the Clones
60: Attack of the Clones,Revenge of the Sith
61: Attack of the Clones,Revenge of the Sith
62: Attack of the Clones
63: Attack of the Clones
64: Attack of the Clones,Revenge of the Sith
65: Attack of the Clones,Revenge of the Sith
66: Attack of the Clones
67: Attack of the Clones
68: Attack of the Clones
69: Attack of the Clones
70: Attack of the Clones
71: Attack of the Clones
72: The Phantom Menace
73: Attack of the Clones,Revenge of the Sith
74: Attack of the Clones
75: Attack of the Clones
76: Attack of the Clones,Revenge of the Sith
77: Revenge of the Sith
78: Revenge of the Sith
79: Revenge of the Sith,A New Hope
80: Attack of the Clones,Revenge of the Sith
81: Revenge of the Sith
82: The Force Awakens
83: The Force Awakens
84: The Force Awakens
85: The Force Awakens
86: The Force Awakens
87: Attack of the Clones,The Phantom Menace,Revenge of the Sith
films
vehicles
1: Snowspeeder,Imperial Speeder Bike
2:
3:
4:
5: Imperial Speeder Bike
6:
7:
8:
9:
10: Tribubble bongo
11: Zephyr-G swoop bike,XJ-6 airspeeder
12:
13: AT-ST
14:
15:
16:
17: Snowspeeder
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31: Tribubble bongo
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42: Sith speeder
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64: Flitknot speeder
65:
66:
67: Koro-2 Exodrive airspeeder
68:
69:
70:
71:
72:
73:
74:
75:
76:
77: Tsmeu-6 personal wheel bike
78:
79:
80:
81:
82:
83:
84:
85:
86:
87:
vehicles
starships
1: X-wing,Imperial shuttle
2:
3:
4: TIE Advanced x1
5:
6:
7:
8:
9: X-wing
10: Jedi starfighter,Trade Federation cruiser,Naboo star skiff,Jedi Interceptor,Belbullab-22 starfighter
11: Trade Federation cruiser,Jedi Interceptor,Naboo fighter
12:
13: Millennium Falcon,Imperial shuttle
14: Millennium Falcon,Imperial shuttle
15:
16:
17: X-wing
18: X-wing
19:
20:
21: Slave 1
22:
23:
24: Millennium Falcon
25:
26:
27:
28: A-wing
29:
30: Millennium Falcon
31:
32:
33:
34:
35:
36:
37: Naboo Royal Starship
38:
39:
40:
41:
42: Scimitar
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55: Jedi starfighter
56:
57: Naboo fighter
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:
73:
74:
75:
76:
77: Belbullab-22 starfighter
78:
79:
80:
81:
82:
83:
84: T-70 X-wing fighter
85:
86:
87: H-type Nubian yacht,Naboo star skiff,Naboo fighter
starships
One Cool thing about data.table is that you do not need to reassign the object once you modify it
This creates a new variable
This will just square body_mass_g in place
You will not see these changes unless you ask data.table to show you with an additional set of []
name height mass hair_color skin_color eye_color birth_year sex
1: Luke Skywalker 172 5929 blond fair blue 19 male
2: C-3PO 167 5625 <NA> gold yellow 112 none
3: R2-D2 96 1024 <NA> white, blue red 33 none
gender homeworld species
1: masculine Tatooine Human
2: masculine Tatooine Droid
3: masculine Naboo Droid
films
1: The Empire Strikes Back,Revenge of the Sith,Return of the Jedi,A New Hope,The Force Awakens
2: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope
3: The Empire Strikes Back,Attack of the Clones,The Phantom Menace,Revenge of the Sith,Return of the Jedi,A New Hope,...
vehicles starships
1: Snowspeeder,Imperial Speeder Bike X-wing,Imperial shuttle
2:
3:
penguins |>
mutate(square_mass = body_mass_g^2,
big_penguin = ifelse(square_mass >= median(square_mass, na.rm = TRUE) , TRUE, FALSE)) |>
select(species, square_mass, big_penguin) |>
head(5)
# A tibble: 5 × 3
species square_mass big_penguin
<fct> <dbl> <lgl>
1 Adelie 14062500 FALSE
2 Adelie 14440000 FALSE
3 Adelie 10562500 FALSE
4 Adelie NA NA
5 Adelie 11902500 FALSE
Good news. We can! But there are some caveats
We can do this by chaining [] together
Or using the magrittr pipe
We cannot do this by doing lots of stuff in our J column
[]
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1: Adelie Torgersen 39.1 18.7 181 3750
2: Adelie Torgersen 39.5 17.4 186 3800
3: Adelie Torgersen 40.3 18.0 195 3250
4: Adelie Torgersen NA NA NA NA
5: Adelie Torgersen 36.7 19.3 193 3450
sex year body_mass_g_sq square_mass big_peng
1: male 2007 14062500 14062500 FALSE
2: female 2007 14440000 14440000 FALSE
3: female 2007 10562500 10562500 FALSE
4: <NA> 2007 NA NA NA
5: female 2007 11902500 11902500 FALSE
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1: Adelie Torgersen 39.1 18.7 181 3750
2: Adelie Torgersen 39.5 17.4 186 3800
3: Adelie Torgersen 40.3 18.0 195 3250
4: Adelie Torgersen NA NA NA NA
5: Adelie Torgersen 36.7 19.3 193 3450
6: Adelie Torgersen 39.3 20.6 190 3650
sex year body_mass_g_sq square_mass big_peng
1: male 2007 14062500 14062500 FALSE
2: female 2007 14440000 14440000 FALSE
3: female 2007 10562500 10562500 FALSE
4: <NA> 2007 NA NA NA
5: female 2007 11902500 11902500 FALSE
6: male 2007 13322500 13322500 FALSE
by
species | V1 |
---|---|
Adelie | 38.79139 |
Gentoo | 47.50488 |
Chinstrap | 48.83382 |
species | bill_length_mm_Mean | bill_length_mm_Min | bill_length_mm_Max | bill_depth_mm_Mean | bill_depth_mm_Min | bill_depth_mm_Max | flipper_length_mm_Mean | flipper_length_mm_Min | flipper_length_mm_Max | body_mass_g_Mean | body_mass_g_Min | body_mass_g_Max | year_Mean | year_Min | year_Max |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Adelie | 38.79139 | 32.1 | 46.0 | 18.34636 | 15.5 | 21.5 | 189.9536 | 172 | 210 | 3700.662 | 2850 | 4775 | 2008.013 | 2007 | 2009 |
Chinstrap | 48.83382 | 40.9 | 58.0 | 18.42059 | 16.4 | 20.8 | 195.8235 | 178 | 212 | 3733.088 | 2700 | 4800 | 2007.971 | 2007 | 2009 |
Gentoo | 47.50488 | 40.9 | 59.6 | 14.98211 | 13.1 | 17.3 | 217.1870 | 203 | 231 | 5076.016 | 3950 | 6300 | 2008.081 | 2007 | 2009 |
species | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | year |
---|---|---|---|---|---|
Adelie | 38.79139 | 18.34636 | 189.9536 | 3700.662 | 2008.013 |
Adelie | 32.1 | 15.5 | 172 | 2850 | 2007 |
Adelie | 46 | 21.5 | 210 | 4775 | 2009 |
Gentoo | 47.50488 | 14.98211 | 217.187 | 5076.016 | 2008.081 |
Gentoo | 40.9 | 13.1 | 203 | 3950 | 2007 |
Gentoo | 59.6 | 17.3 | 231 | 6300 | 2009 |
Chinstrap | 48.83382 | 18.42059 | 195.8235 | 3733.088 | 2007.971 |
Chinstrap | 40.9 | 16.4 | 178 | 2700 | 2007 |
Chinstrap | 58 | 20.8 | 212 | 4800 | 2009 |