Below are some practice exam questions based on week 1 material which covered exploratory data analysis. Please note that the exam will also cover material introduced in week 2.
Paste the code below to download a template file to answer the exercises.
download.file("https://sta101.github.io/static/practice/eda_practice_template.Rmd",
destfile = "eda-practice.rmd")
Libraries
library(tidyverse)
library(viridis)
Data
The dataset for these practice questions comes from the
ggplot
package (loaded with the tidyverse) and contains
fuel economy data on 38 popular models of cars from 1999 to 2008. Be
sure to check out ?mpg
for more info, especially to
understand the column names.
data(mpg)
How many observations are in the mpg
data set? How
many variables?
Create a scatterplot with engine displacement on the x-axis and city miles per gallon (mpg) on the y-axis, color the points by the number of cylinders the vehicle has. Be sure to appropriately label the axes and give your graph a title. Discuss any trends you notice.
What proportion of vehicles in the dataset have a 4 cylinder engine? What proportion have an 8 cylinder engine?
Create a new column called avg_mpg
that reports the
average mpg a vehicle gets between city and highway. Save your new data
frame as mpg2
.
Recreate the plot below. Make sure axes and title labels are exactly matching, including spelling, capitalization, etc.
mpg2
dataset. Also, if you
try, e.g. aes(x = cyl)
you will not see the correct plot
because R
thinks the number of cylinders is continuous.
Instead try aes(x = as.factor(cyl))
to make it a discrete
category. If you are unsure what geometry to use, check out a list of
common geometries here.mpg_class
that labels a
vehicle as “great mpg” if average mpg is greater than or equal to 25
mpg, and “not great mpg” if average mpg is less than 25 mpg. Next,
recreate the plot below. Make sure axes and title labels are exactly
matching, including spelling, capitalization, etc. Is the figure
informative? Why or why not?mpg2
dataset again. You will need
to use filter()
paired with the appropriate logic to plot
only a subset of the data. See ae3
for more information on filter()
. You will need to use the
viridis package to obtain the correct color scheme. check
?scale_fill_viridis
for more information.avg_mpg
per manufacturer and then sort
the resulting data frame so that the most fuel-efficient auto
manufacturers appear first. (i.e. highest MPG manufacturers at the top).
Print the five most fuel-efficient manufacturers to the screen.