Bulletin

Packages

library(tidyverse)
library(palmerpenguins)
library(viridis) # we'll use to customize colors

Data

data(penguins)

Today

We’ll begin today by completing ae2.

By the end of today you will…

Filtering revisited

The table of logical operators below will be helpful as you work with filtering.

operator definition
< is less than?
<= is less than or equal to?
> is greater than?
>= is greater than or equal to?
== is exactly equal to?
!= is not equal to?
is.na(x) is x NA?
!is.na(x) is x not NA?
x %in% y is x in y?
!(x %in% y) is x not in y?
x & y is x AND y?
x \| y is x OR y?
!x is not x?

Examples

How many penguins have flipper length > 200 mm?

penguins %>%
  filter(flipper_length_mm > 200)
## # A tibble: 148 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Dream               35.7          18                 202        3550
##  2 Adelie  Dream               41.1          18.1               205        4300
##  3 Adelie  Dream               40.8          18.9               208        4300
##  4 Adelie  Biscoe              41            20                 203        4725
##  5 Adelie  Torgersen           41.4          18.5               202        3875
##  6 Adelie  Torgersen           44.1          18                 210        4000
##  7 Adelie  Dream               41.5          18.5               201        4000
##  8 Gentoo  Biscoe              46.1          13.2               211        4500
##  9 Gentoo  Biscoe              50            16.3               230        5700
## 10 Gentoo  Biscoe              48.7          14.1               210        4450
## # … with 138 more rows, and 2 more variables: sex <fct>, year <int>
  • We could also pipe into nrow() to quickly grab the number of rows. Try it!

How many female penguins have flipper length > 200 mm?

penguins %>%
  filter(flipper_length_mm > 200 & (sex == "female"))
## # A tibble: 60 × 8
##    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Dream            35.7          18                 202        3550
##  2 Gentoo  Biscoe           46.1          13.2               211        4500
##  3 Gentoo  Biscoe           48.7          14.1               210        4450
##  4 Gentoo  Biscoe           46.5          13.5               210        4550
##  5 Gentoo  Biscoe           45.4          14.6               211        4800
##  6 Gentoo  Biscoe           43.3          13.4               209        4400
##  7 Gentoo  Biscoe           40.9          13.7               214        4650
##  8 Gentoo  Biscoe           45.5          13.7               214        4650
##  9 Gentoo  Biscoe           45.8          14.6               210        4200
## 10 Gentoo  Biscoe           42            13.5               210        4150
## # … with 50 more rows, and 2 more variables: sex <fct>, year <int>

For how many penguins was flipper length not measured (i.e. reported as NA)?

penguins %>%
  filter(is.na(flipper_length_mm))
## # A tibble: 2 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…             NA            NA               NA          NA <NA> 
## 2 Gentoo  Biscoe             NA            NA               NA          NA <NA> 
## # … with 1 more variable: year <int>

What proportion of penguins are from each island?

penguins %>%
  count(island) %>%
  mutate(proportion = n / sum(n))
## # A tibble: 3 × 3
##   island        n proportion
##   <fct>     <int>      <dbl>
## 1 Biscoe      168      0.488
## 2 Dream       124      0.360
## 3 Torgersen    52      0.151

Exercise:

What proportion of penguins are from each species?

# code here

Plots

The procedure used to construct plots can be summarized using the code below.

ggplot(data = [dataset], 
       mapping = aes(x = [x-variable], y = [y-variable])) +
   geom_xxx() +
   geom_xxx() + 
  other options

Example: bar plot

ggplot(data = penguins, 
       mapping = aes(x = species)) +
  geom_bar() +
  labs(x = "Species", y = "Count", title = "Palmer penguin species")

Example: stacked bar plot

penguins %>%
  filter(!is.na(sex)) %>%
ggplot(mapping = aes(x = species, fill = sex)) +
  geom_bar(position = "fill") +
  labs(x = "Species", y = "Count", title = "Palmer penguin species")

  • try with and without position = "fill"

Aesthetics

An aesthetic is a visual property in your plot that is derived from the data.

  • shape
  • color
  • size
  • alpha (transparency)

We can map a variable in our dataset to a color, a size, a transparency, and so on. The aesthetics that can be used with each geom_ can be found in the documentation.

Here we are going to use the viridis package, which has more color-blind accessible colors. scale_color_viridis specifies which colors you want to use. You can learn more about the options here.

Other sources that can be helpful in devising accessible color schemes include Color Brewer, the Wes Anderson package, and the cividis package.

This visualization shows a scatterplot of bill length (x variable) and flipper length (y variable). Using the viridis function, we make points for male penguins purple and female penguins yellow. We also add axes labels and a title.

ggplot(data = penguins, 
       mapping = aes(x = bill_length_mm, y = flipper_length_mm,
                     color = sex)) + 
   geom_point() + 
   labs(title = "Flipper length vs bill length",
        x = "Bill length (mm)", y = "Flipper length (mm)") + 
        scale_color_viridis(discrete=TRUE, option = "D", name="Sex")
## Warning: Removed 11 rows containing missing values (geom_point).

Exercise:

Can you remove the NAs from the above visualization?

Question: What will the visualization look like below? Write your answer down before running the code.

ggplot(data = penguins, 
       mapping = aes(x = bill_length_mm, y = flipper_length_mm,
                     shape = sex)) + 
   geom_point() + 
   labs(title = "Flipper length vs bill length",
        x = "Bill length (mm)", y = "Flipper length (mm)") + 
        scale_color_viridis(discrete=TRUE, option = "D", name="Sex")

Faceting

We can use smaller plots to display different subsets of the data using faceting. This is helpful to examine conditional relationships.

penguins %>%
  ggplot(aes(x = bill_length_mm, flipper_length_mm, color = island)) +
  geom_point() +
  facet_wrap(~ species) +
  labs(x = "Bill length (mm)", y = "Flipper length (mm)", color = "Island")
## Warning: Removed 2 rows containing missing values (geom_point).

penguins %>%
  ggplot(aes(x = bill_length_mm, flipper_length_mm, color = species)) +
  geom_point() +
  facet_wrap(~ island) +
  labs(x = "Bill length (mm)", y = "Flipper length (mm)", color = "Island") +
  scale_color_viridis(discrete = TRUE)
## Warning: Removed 2 rows containing missing values (geom_point).

ggplot activity

# code here

Additional resources