Getting Started in ggplot

Josh Allen

Department of Political Science at Georgia State University

8/29/22

Where We Have Been

plot(penguins$bill_length_mm,
   penguins$body_mass_g,
   xlab = "Bill Length(mm)",
   ylab = "Body Mass(g)")
   abline(lm(body_mass_g~bill_length_mm, data = penguins))

Expanding What We Know

par(mfrow = c(2,2))
plot(penguins$bill_length_mm,
        penguins$body_mass_g,
    xlab = "Bill Length(mm)",
   ylab = "Body Mass(g)")
hist(penguins$bill_length_mm,
  xlim = c(30, 60))
plot(density(penguins$bill_length_mm))
plot(penguins$bill_length_mm,
    penguins$body_mass_g,
    xlab = "Bill Length(mm)",
    ylab = "Body Mass(g)")
abline(lm(body_mass_g~bill_length_mm, data = penguins))

Research Data Services

Our Team

Get Ready Badges

How To Get the Badges

The Importance of Graphing

Why visualize your data?

mean(graph_dat$x)
[1] 54.2657
sd(graph_dat$x)
[1] 16.713
mean(graph_dat$y)
[1] 47.8351
sd(graph_dat$y)
[1] 26.84777
cor(graph_dat$x, graph_dat$y)
[1] -0.06601891

The Dino Strikes

Datasaurus Dozen
The Datasaurus Dozen

Why ggplot2?

The transferable skills from ggplot2 are not the idiosyncrasies of plotting syntax, but a powerful way of thinking about visualization, as a way of mapping between variables and the visual properties of geometric objects that you can perceive.

– Hadley Wickham

Why ggplot2?

  • You have probably heard of it but why use it?

  • Once we understand the “grammar” making figures becomes a lot easier

  • Tons of organizations use it

  • Flexibility

    • Tons ways to customize appearance
    • Lots of functions
    • Lots of extensions
  • Reproducibility

    • Doesn’t require you to remember each input from a drop down menu
    • Defaults to universally usable formats
    • Replaces itself automatically in your directory

The Grammar of Graphics

Grammar

“Good grammar is just the first step of creating a good sentence”

  • How is the data related to the figure on the right?

Here is just a scatter plot with various shapes and sizes. We will fill in the rest as the slides go on

Building the Plot

Body Weight of Penguins and Bill Length

  • Penguins

  • Species

  • Island

On the right hand side is the legend that denotes what each color and shape represent. Red represents the island Biscoe, Green Represents the island Dream. Light Blue represents the Torgersen Island. The circles represent the Adelie penguins, the triangles represent the Chinstrap penguins, and the squares represent the Gentoo penguins

Building the Plot

Body Weight of Penguins and Bill Length

  • Penguins

  • Species

  • Island

On the right hand side is the legend that denotes what each color and shape represent. Red represents the island Biscoe, Green Represents the island Dream. Light Blue represents the Torgersen Island. The circles represent the Adelie penguins, the triangles represent the Chinstrap penguins, and the squares represent the Gentoo penguins. The x axis represents bill length in milimeters and the y axis represents body mass in grams

Building the Plot

Body Weight of Penguins and Bill Length

  • Penguins

  • Species

  • Island

So How Did We go From?

This

This is the template that I started with with just points and colors

To This

Making Plots

The Grammar


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.
Statistics stat_*() The statistical transformations applied to the data.
Scales scale_*() Maps between the data and the aesthetic dimensions.
Coordinate System coord_*() Maps data into the plane of the data rectangle.
Facets facet_*() The arrangement of the data into a grid of plots.
Visual Themes theme() and theme_*() The overall visual defaults of a plot.

Where do they go?

ggplot() +
  geom_point(data = penguins,
   aes(
  x = bill_length_mm, 
  y = body_mass_g,
  shape = species, 
  color = island),
  size = 3)

Plotting Data

country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134
Afghanistan Asia 1982 39.854 12881816 978.0114
Afghanistan Asia 1987 40.822 13867957 852.3959
Afghanistan Asia 1992 41.674 16317921 649.3414
Afghanistan Asia 1997 41.763 22227415 635.3414

Here is your shell script

## be sure you have done 
## install.packages("gapminder")
## library(gapminder)

ggplot() +
  geom_point(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))

Activity

  • Add color, size, alpha, and shape aesthetics to your graph.

  • Be bold be brave! Experiment!

  • What happens when you add more than one aesthetic?

05:00

How would you make this plot?

ggplot() + 
  geom_point(data = gapminder,
   aes(x = gdpPercap,
    y = lifeExp,
    color = "blue"))

ggplot() +
  geom_point(data = gapminder,
    aes(x = gdpPercap,
        y = lifeExp),
        color = "blue") 

Same options different stuff

What Comes With ggplot

 [1] "geom_abline"            "geom_area"              "geom_bar"              
 [4] "geom_bin_2d"            "geom_bin2d"             "geom_blank"            
 [7] "geom_boxplot"           "geom_col"               "geom_contour"          
[10] "geom_contour_filled"    "geom_count"             "geom_crossbar"         
[13] "geom_curve"             "geom_density"           "geom_density_2d"       
[16] "geom_density_2d_filled" "geom_density2d"         "geom_density2d_filled" 
[19] "geom_dotplot"           "geom_errorbar"          "geom_errorbarh"        
[22] "geom_freqpoly"          "geom_function"          "geom_hex"              
[25] "geom_histogram"         "geom_hline"             "geom_jitter"           
[28] "geom_label"             "geom_line"              "geom_linerange"        
[31] "geom_map"               "geom_path"              "geom_point"            
[34] "geom_pointrange"        "geom_polygon"           "geom_qq"               
[37] "geom_qq_line"           "geom_quantile"          "geom_raster"           
[40] "geom_rect"              "geom_ribbon"            "geom_rug"              
[43] "geom_segment"           "geom_sf"                "geom_sf_label"         
[46] "geom_sf_text"           "geom_smooth"            "geom_spoke"            
[49] "geom_step"              "geom_text"              "geom_tile"             
[52] "geom_violin"            "geom_vline"            

via GIPHY

Example(sort of)

Your Turn

02:00

Answer

ggplot() +
  geom_boxplot(data = gapminder,
  aes(x = continent, 
  y = lifeExp))

Your Turn Again

Hint do not supply a Y value

02:00

ggplot() +
  geom_histogram( data = gapminder, 
    aes(x = lifeExp))

Your Turn

Make This Density Plot filled by continent

02:00

ggplot() +
  geom_density( data = gapminder,
  aes(x = lifeExp,
     fill = continent),
     alpha = 0.75)

Complex graph!

Local

ggplot() +
  geom_point(data = gapminder,
   aes(x = gdpPercap,
       y = lifeExp, 
       color = continent)) + 
  geom_smooth(data = gapminder,
   aes(x = gdpPercap, 
       y = lifeExp, 
      color = continent)) 

Global

ggplot(gapminder,
       aes(x = gdpPercap,
          y = lifeExp, 
          color = continent))  + 
  geom_point() +
  geom_smooth() 

Building Plots

Starting with Data and aesthics

ggplot(gapminder,
 aes(x = gdpPercap,
    y = lifeExp))

Add geom_point

ggplot(gapminder,
    aes(x = gdpPercap,
        y = lifeExp,
        color = continent)) +
  geom_point()

Add geom_smooth

ggplot(gapminder,
    aes(x = gdpPercap,
       y = lifeExp,
       color = continent)) +
  geom_point() +
  geom_smooth()

Change Transparency

ggplot(gapminder,
    aes(x = gdpPercap,
       y = lifeExp,
       color = continent)) +
  geom_point(alpha = 0.5) +
  geom_smooth()

Adjust scales with scale_x_log10

ggplot(gapminder,
     aes(x = gdpPercap,
         y = lifeExp,
         color = continent)) +
  geom_point(alpha = 0.5) +
  geom_smooth() +
  scale_x_log10()

Add axis labels and title with labs

ggplot(gapminder,
      aes(x = gdpPercap,
          y = lifeExp,
          color = continent)) +
  geom_point(alpha = 0.5) +
  geom_smooth() +
  scale_x_log10() +
  labs(x = "GDP per cap",
       y = "Life Expectancy",
       title = "The Effect of GDP per cap on Life Expectancy")

Add viridis color scale

ggplot(gapminder,
      aes(x = gdpPercap,
          y = lifeExp,
          color = continent)) +
  geom_point(alpha = 0.5) +
  geom_smooth() +
  scale_x_log10() +
  labs(x = "GDP per cap",
       y = "Life Expectanty",
       title = "The Effect of GDP per cap on Life Expectancy") +
  scale_color_viridis_d()

Add theme

ggplot(gapminder,
      aes(x = gdpPercap,
         y = lifeExp,
         color = continent)) +
  geom_point(alpha = 0.5) +
  geom_smooth() +
  scale_x_log10() +
  labs(x = "GDP per cap",
       y = "Life Expectanty",
       title = "The Effect of GDP per cap on Life Expectancy") +
  scale_color_viridis_d() +
  theme_bw()

Facet by Continent

ggplot(gapminder,
    aes(x = gdpPercap,
        y = lifeExp,
        color = continent)) +
  geom_point(alpha = 0.5) +
  geom_smooth() +
  scale_x_log10() +
  labs(x = "GDP per cap",
      y = "Life Expectanty",
      title = "The Effect of GDP per cap on Life Expectancy") +
  scale_color_viridis_d() +
  theme_bw() +
  facet_wrap(vars(continent))

Change Theme Options

ggplot(gapminder,
  aes(x = gdpPercap,
      y = lifeExp,
      color = continent)) +
  geom_point(alpha = 0.5) +
  geom_smooth() +
  scale_x_log10() +
  labs(x = "GDP per cap",
      y = "Life Expectanty",
      title = "The Effect of GDP per cap on Life Expectancy") +
  scale_color_viridis_d() +
  theme_bw() +
  facet_wrap(vars(continent)) +
  theme(legend.position = "none")

Scales

Example layer What it does
scale_x_continuous() Make the x-axis continuous
scale_x_continuous(breaks = 1:5)  Manually specify axis ticks
scale_x_log10() Log the x-axis
scale_color_gradient() Use a gradient
scale_fill_viridis_d() Fill with discrete viridis colors

Scales in Action

ggplot(gapminder,
  aes(x = gdpPercap,
      y = lifeExp,
      size = pop,
      color = continent)) +
  geom_point(alpha = 0.5) +
  labs(x = "Income",
     y = "Life Expectancy") +
  scale_x_log10(labels = scales::dollar) +
  theme_bw() 

Scales in Action

ggplot(gapminder,
  aes(x = gdpPercap,
     y = lifeExp,
     size = pop,
    color = continent)) +
  geom_point(alpha = 0.5) +
  labs(x = "Income", y = "Life Expectancy") +
  scale_x_log10(labels = scales::dollar) +
  scale_color_manual(values = c("#04a3bd",
     "#f0be3d",
     "#931e18",
     "#da7901")) +
  theme_bw() 

Scales in Action

ggplot(gapminder,
  aes(x = gdpPercap,
     y = lifeExp,
     size = pop,
    color = continent)) +
  geom_point(alpha = 0.5) +
  labs(x = "Income", y = "Life Expectancy") +
  scale_x_log10(labels = scales::dollar) +
  scale_color_met_d(name = "Veronese") +
  theme_bw() 

Scales

The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.


The extensions (*) can be filled by e.g.:

  • continuous(), discrete(), reverse(), log10(), sqrt(), date() for positions

  • continuous(), discrete(), manual(), gradient(), gradient2(), brewer() for colors

  • continuous(), discrete(), manual(), ordinal(), area(), date() for sizes

  • continuous(), discrete(), manual(), ordinal() for shapes

  • continuous(), discrete(), manual(), ordinal(), date() for transparency

Allison Horsts illustration ofthe correct use of continuous versus discrete; however, in {ggplot2} these are interpeted in a different way: as quantitative and qualitative.

Illustration by Allison Horst

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height
  • weight
  • age
  • counts

Discrete:
qualitative or categorical data

  • species
  • sex
  • study sites
  • age group

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height (continuous)
  • weight (continuous)
  • age (continuous or discrete)
  • counts (discrete)

Discrete:
qualitative or categorical data

  • species (nominal)
  • sex (nominal)
  • study site (nominal or ordinal)
  • age group (ordinal)

Scales in Action

ggplot(gapminder,
      aes(x = gdpPercap,
          y = lifeExp,
          size = pop,
          color = continent)) +
  geom_point(alpha = 0.5) +
  scale_x_continuous(limits = c(0, 30000)) +
  theme_bw()

Coordinate Systems


= interpret the position aesthetics

  • linear coordinate systems: preserve the geometrical shapes
    • coord_cartesian()
    • coord_fixed()
    • coord_flip()
  • non-linear coordinate systems: likely change the geometrical shapes
    • coord_polar()
    • coord_map() and coord_sf()
    • coord_trans()

Change the Limits of the plot

ggplot(gapminder,
    aes(x = gdpPercap,
     y = lifeExp)) +
  geom_point(alpha = 0.5) +
  scale_x_continuous(limits = c(0, 30000)) +
  theme_bw()

Circular Coordinate Systems

ggplot(penguins,
 aes(x = species,
 fill = species)) +
geom_bar() + 
coord_polar()

ggplot(penguins,
 aes(x = species,
  fill = species)) +
geom_bar() + 
coord_cartesian()

Your Turn

Change the colors of this density plot

04:00

How I Did It

ggplot(penguins,
 aes(x = bill_length_mm,
    fill = species)) +
  geom_density( alpha = 0.75) +
  theme_bw() +
  scale_fill_viridis_d(option = "magma")

Facets

Example layer What it does
facet_wrap(vars(continent)) Plot for each continent
facet_wrap(vars(continent, year)) Plot for each continent/year
facet_wrap(…, ncol = 1) Put all facets in one column
facet_wrap(…, nrow = 1) Put all facets in one row

facet_wrap

ggplot(gapminder,
  aes(x = gdpPercap,
     y = lifeExp,
     size = pop)) +
  geom_point(alpha = 0.5) +
  theme_bw() +
  scale_x_log10() +
  facet_wrap(vars(continent)) 

facet_grid

ggplot(data = filter(gapminder,
 year %in% c(1987,1997,2002, 2007)),
    aes(x = gdpPercap,
     y = lifeExp,
     size = pop)) +
  geom_point(alpha = 0.5) +
  theme_bw() +
  scale_x_log10() +
  facet_grid(vars(year))

facet_grid

ggplot(data = filter(gapminder,
 year %in% c(1987,1997,2002, 2007)),
    aes(x = gdpPercap,
     y = lifeExp,
     size = pop)) +
  geom_point(alpha = 0.5) +
  theme_bw() +
  scale_x_log10() +
  facet_grid(vars(year), vars(continent))

Labels

Example layer What it does
labs(title = “Neat title”) Title
labs(caption = “Something”) Caption
labs(y = “Something”) y-axis
labs(size = “Population”) Title of size legend

Labels with labs

ggplot(gapminder, 
       aes(x = gdpPercap,
        y = lifeExp, 
        color = continent,
        size = pop)) +
 geom_point(alpha = 0.5) +
  scale_x_log10() +
  labs(title = "Health and wealth grow together",
       subtitle = "Data from 2007",
       x = "Wealth (GDP per capita)",
       y = "Health (life expectancy)",
       color = "Continent",
       size = "Population",
       caption = "Source: The Gapminder Project")

Changing the Default Theme

theme_minimal

theme_dark

The theme argument

  • Has lots and lots of options(94 to be exact)

  • You can change basically anything you could think of in a plot

    • My ggplot theme is basically just a some tweaks to theme arguments
theme_bw() + 
theme(legend.position = "bottom",
      plot.title = element_text(face = "bold"),
      axis.title.y = element_text(face = "italic"))

Saving your work

your_plot_here = ggplot(data, aes(x = blah, y = blah))
ggsave("name-of-your-file.pdf",your_plot_here) 
ggsave("name-of-your-file.pngs",your_plot_here)

Making Maps

New Packages

install.packages(c("sf", "tidygeocoder"))
devtools::install_github("ropenscilabs/rnaturalearth")
library(rnaturalearth)
library(sf)
  • If you are on a Mac please go to the r-spatial-website if you run into problems

  • The workhorse for this particular section will be sf

Mapping in R

  • R and ggplot can get you pretty far

  • The stuff from these workshops broadly apply

    • including your dplyr verbs
  • Lots of your needs to make static maps can be met

  • Depending on what you are doing you may have to wait a bit

Map Made By Kieran Healy

Shape Files

  • To read in shape files you use read_sf you should see something that looks like this!
NAME geometry
Mordor POINT (1330373 596482.5)
Hobbiton POINT (515948 1043820)
Edoras POINT (853993.3 723854.1)
Rivendell POINT (884331.5 1057787)
Minas Tirith POINT (1111425 621234.6)

Making Maps in ggplot

## devtools::install_github("ropenscilabs/rnaturalearth")
library(rnaturalearth)

world_map_ne = ne_states(returnclass = "sf")

ggplot() +
  geom_sf(data = world_map_ne)

Changing the Projections

ggplot() +
  geom_sf(data = world_map_ne) + 
  coord_sf(crs = "+proj=cea +lon_0=0 +lat_ts=45") +
  theme_void()

Working Without Shape Files(kind of)

  • As is the case sometimes we do not have a shape file

  • Don’t worry sf has you covered

  • You just need to feed it the right things

Making a Bespoke Shapefile

  • You can either feed it latitude and longitudes

  • Or you can feed it addresses

  • Most free ones have rate limits

    • So be mindful of the size of your data

Making a Bespoke Shapefile(cont)

ga_cities = tribble( 
  ~city, ~lat, ~long,
  "Atlanta", 33.748955, -84.388099,
  "Athens", 33.950794, -83.358884,
  "Savannah", 32.113192, -81.089350
)


ga_cities_geometry = ga_cities |>  
  st_as_sf(coords = c("long", "lat"), crs = st_crs("EPSG:4326"))
ga_cities_geometry
city geometry
Atlanta POINT (-84.3881 33.74896)
Athens POINT (-83.35888 33.95079)
Savannah POINT (-81.08935 32.11319)

Making a Bespoke Shapefile(cont)

library(tidygeocoder)
breweries_I_visit = tribble(
  ~name, ~address,
  "Russian River Brewing", "725 4th St, Santa Rosa, CA 95404",
  "Orpheus Brewing", "1440 Dutch Valley Pl NE, Atlanta, GA 30324",
  "Three Tavens", "121 New St, Decatur, GA 30030",
  "HenHouse Brewing Company", "322 Bellevue Ave, Santa Rosa, CA 95407"
)


breweries_geocode = breweries_I_visit |> 
  geocode(address, method = "osm")

breweries_geocode |> 
  st_as_sf(coords = c("long", "lat"), crs = st_crs("EPSG:4326")) |> 
  knitr::kable(format = "html")
name address geometry
Russian River Brewing 725 4th St, Santa Rosa, CA 95404 POINT (-122.7117 38.4418)
Orpheus Brewing 1440 Dutch Valley Pl NE, Atlanta, GA 30324 POINT (-84.36874 33.79355)
Three Tavens 121 New St, Decatur, GA 30030 POINT (-84.28488 33.77306)
HenHouse Brewing Company 322 Bellevue Ave, Santa Rosa, CA 95407 POINT (-122.7255 38.40122)

Mapping Middle Earth

# using some programming read in everything iteratively
# there is probably a simpler solution but this worked for me 
shapes <-  list.files(path = "data/ME-GIS/", pattern = "*.shp", full.names = TRUE)

shapes_read = map(shapes, read_sf)

remove <- c("data/ME-GIS//", ".shp", "2", "02", "_18", "")
#https://stackoverflow.com/questions/29036960/remove-multiple-patterns-from-text-vector-r
shapes_names = shapes |> 
  str_remove_all(paste(remove, collapse = "|")) |> 
  str_to_lower()

names(shapes_read) = shapes_names

map(names(shapes_read), ~assign(.x, shapes_read[[.x]], envir = .GlobalEnv))

places = placenames |>
  filter(NAME %in% c("Hobbiton",
                     "Rivendell",
                     "Edoras",
                     "Minas Tirith"))

mordor = placenames[placenames$NAME == "Mordor",]

mountains_to_label = mountains_anno[mountains_anno$name == "Erebor The Lonely Mountain",]


ggplot() +
  geom_sf(data = contours,
          size = 0.15,
          color = "grey90") +
  geom_sf(data = coastline,
          size = 0.25,
          color = "grey50") +
  geom_sf(data = rivers,
          size = 0.2,
          color = "#0776e0",
          alpha = 0.5) +
  geom_sf(data = lakes,
          size = 0.2,
          color = "#0776e0",
          fill = "#0776e0") +
  geom_sf(data = forests,
          size = 0,
          fill = "#035711",
          alpha = 0.5) +
  geom_sf(data = mountains_anno, size = 0.25) +
  geom_sf(data = places) +
  geom_sf_label(data = filter(places, NAME !="Rivendell"),
                aes(label = NAME),
                nudge_y = 80000, size = 6) +
  geom_sf_label(data = filter(places, NAME == "Rivendell"),
                aes(label = NAME),
                nudge_x = 88000, size = 6) +
  geom_sf_label(data = mountains_to_label,
                aes(label = name),
                nudge_y = 80000, size = 6) +
  geom_sf_label(data = mordor,
                aes(label = NAME),
                nudge_y = -10000, size = 6) +
  theme_void() +
  theme(plot.background = element_rect(fill = "#fffce3"))