Getting started

Download this application exercise by pasting the code below into your console

    download.file("https://sta101.github.io/static/appex/ae15.Rmd",
    destfile = "ae15.rmd")

Next download the references file by pasting the code below into your console

download.file("https://sta101.github.io/static/appex/references.bib",
    destfile = "references.bib")

Cleaning up the mess below

knitr::opts_chunk$set(message = TRUE, 
                      warning = TRUE, 
                      echo = TRUE,
                      fig.width = 6, #width of figure
                      fig.asp = .618, #set figure height based on aspect ratio
                      out.width = "75%", #width relative to text
                      fig.align = "center" #alignment
                      )
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(palmerpenguins) #use the penguins data frame
library(knitr)

Today

Introduction

For this analysis, we will use the penguins data set in the palmerpenguins R package (Horst, Hill, and Gorman 2020). This data set contains measurements and other characteristics for over 300 penguins observed near Palmer Station in Antarctica. The data were originally collected by Dr. Kristen Gorman.

Click here to learn more on the palmerpenguins website.

Code chunk options

Code chunk options are used to customize how the code and output is displayed in the knitted R Markdown document. There are two ways to set code chunk options:

A few options to change what we show/hide in the knitted document:

For the project, you will set the option echo = FALSE to hide all code in your final report.

Citations

Your report will include citations, e.g. the data source, previous research, and other sources as needed. At a minimum, you should have a citation for the data source.

All of your bibliography entries will be stored in a .bib file. The entries of the bibliography are stored using BibTex, i.e., a format to store citations in LaTeX. Let’s take a look at references.bib.

In addition to the .bib file:

Citation examples

  1. In Gorman and LTER (2014), the authors focus on the analysis of Adelie penguins.

  2. Studies have shown whether environmental variability in the form of winter sea ice is associated with differences in male and female pre-breeding foraging niche (Gorman and LTER 2014).

Practice

  • Add a citation for R markdown: The definitive guide to this document.

Customizing plots

Let’s start with a plot looking at the species vs. the island.

ggplot(data = penguins, aes(x = island, fill = species)) + 
  geom_bar(position = "fill") + 
  labs(x = "Island", 
       y = "Proportion",
       fill = "Species", 
       title = "Distribution of species", 
       subtitle = "by island")

Standard color palette + theme

You can set a standard color palette and theme at the top of the document to make the plots look coordinated throughout the document. Navigate to the code chunk labeled ggplot2-options and let’s take a look.

Choose 3 colors from the color palette, then use the code below to apply the colors to the segmented bar plot. Remove eval = FALSE from the code chunk header.

#fill in the code and remove #eval = FALSE from the code chunk header
ggplot(data = penguins, aes(x = island, fill = species)) + 
  geom_bar(position = "fill") + 
  labs(x = "Island", 
       y = "Proportion",
       fill = "Species", 
       title = "Distribution of species", 
       subtitle = "by island") + 
  scale_fill_manual(values = c(color_palette$____, 
                               color_palette$_____, 
                               color_palette$_____))
# add code here
ggplot(data = penguins, aes(x = flipper_length_mm)) + 
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing non-finite values (stat_bin).
Distribution of penguin bill depth

Distribution of penguin bill depth

Neatly display table and output

# Complete the code and remove eval = FALSE from the code chunk header
penguins %>%
  filter(!is.na(bill_depth_mm)) %>%
## add code

Acknowledgements

These notes were adapted from the following:

References

Gorman, Kristen, and Palmer Station Antarctica LTER. 2014. “Structural Size Measurements and Isotopic Signatures of Foraging Among Adult Male and Female Gentoo Penguins (Pygoscelis Papua) Nesting Along the Palmer Archipelago Near Palmer Station, 2007-2009.”
Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://allisonhorst.github.io/palmerpenguins/.