By the end of today you will…
glimpse()
,
count()
, group_by()
and
summarize()
Download this application exercise by pasting the code below into your console (bottom left of screen)
download.file("https://sta101.github.io/static/appex/ae1.Rmd",
destfile = "ae1.rmd")
R
as a calculator by typing the following into the
console:5 * 5 + 10
x = 3
x + x^2
x = 1:10
x * 7
In the last couple examples we save some value as the object “x”.
We can “print” x to the screen by typing the name of the object (“x”) in the console or in a code chunk.
R
functionslibrary(tidyverse)
roster = read_csv("https://sta101.github.io/static/appex/data/sample-roster.csv")
survey = read_csv("https://sta101.github.io/static/appex/data/sample-survey.csv")
Question: What objects store the data in the code chunk above? Can you print them to the screen?
Create a new code chunk with CMD+OPTION+I
(mac) or
CTRL+ALT+I
(windows/linux)
So far we’ve already seen two functions. library
and
read_csv
. Functions in R are attached to
parentheses and take an input, aka an argument, and
often (but not always) return an output. To learn more about a function,
you can check the documentation with ?
,
e.g. ?library
.
Let’s glimpse the data frame.
glimpse(survey)
## Rows: 12
## Columns: 5
## $ name <chr> "A", "Appa", "Bumi", "Soka", "Katara", "Suki", "Z…
## $ email <chr> "the-last-Rbender@duke.edu", "yip-yip-appa@duke.e…
## $ bender <chr> "Airbender", "Airbender", "Earthbender", "None", …
## $ previous_programming <chr> "No", "No", "No", "Somewhat", "Yes", "Yes", "Yes"…
## $ cat_dog <chr> "dog", "cat", "cat", "dog", "dog", "cat", "cat", …
To look at all of it, we can use view()
view(survey)
View the roster data in the console
Terminology: “columns” of a dataframe are called variables whereas “rows” are observations.
Question: How many variables are in the data frame
survey
? How many observations? What about the data frame
roster
?
Why must I input net-id email?
roster %>%
left_join(survey, by = "email")
Count the benders in the data
count(survey, bender)
## # A tibble: 5 × 2
## bender n
## <chr> <int>
## 1 Airbender 3
## 2 Earthbender 3
## 3 Firebender 4
## 4 None 1
## 5 Waterbender 1
survey %>%
mutate(pet = ifelse(cat_dog == "dog", 1, 0)) %>%
group_by(bender) %>%
summarize(proportion_dog = mean(pet))
## # A tibble: 5 × 2
## bender proportion_dog
## <chr> <dbl>
## 1 Airbender 0.667
## 2 Earthbender 0.333
## 3 Firebender 0
## 4 None 1
## 5 Waterbender 1