By the end of today you will…
Download this application exercise by pasting the code below into your console
download.file("https://sta101.github.io/static/appex/ae13.Rmd",
destfile = "ae13.rmd")
library(tidyverse)
library(tidymodels)
library(openintro)
The stent30
data set comes from the
openintro
package and is from a study conducted in 2011 on
the effects of arterial stents as a therapy for stroke patients. See the
original publication:
Chimowitz MI, Lynn MJ, Derdeyn CP, et al. 2011. Stenting versus Aggressive Med- ical Therapy for Intracranial Arterial Stenosis. New England Journal of Medicine 365:993- 1003. doi: 10.1056/NEJMoa1105335.
or check ?stent30
for more information.
data(stent30)
glimpse(stent30)
## Rows: 451
## Columns: 2
## $ group <fct> treatment, treatment, treatment, treatment, treatment, treatme…
## $ outcome <fct> stroke, stroke, stroke, stroke, stroke, stroke, stroke, stroke…
Do stents affect stroke outcome in patients?
Write the null and alternative hypothesis. Report the observed statistic.
Simulate under the null and visualize the null distribution.
Compute and report the p-value, compare to \(\alpha = 0.05\) and make a conclusion with appropriate context
Here we revisit the data from the first three seasons of NC Courage games (2017-2019).
courage = read_csv("https://sta101.github.io/static/labs/data/courage.csv")
glimpse(courage)
## Rows: 78
## Columns: 10
## $ game_id <chr> "washington-spirit-vs-north-carolina-courage-2017-04-15", …
## $ game_date <chr> "4/15/2017", "4/22/2017", "4/29/2017", "5/7/2017", "5/14/2…
## $ game_number <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ home_team <chr> "WAS", "NC", "NC", "BOS", "ORL", "NC", "NC", "CHI", "NC", …
## $ away_team <chr> "NC", "POR", "ORL", "NC", "NC", "CHI", "NJ", "NC", "KC", "…
## $ opponent <chr> "WAS", "POR", "ORL", "BOS", "ORL", "CHI", "NJ", "CHI", "KC…
## $ home_pts <dbl> 0, 1, 3, 0, 3, 1, 2, 3, 2, 3, 0, 0, 2, 1, 1, 0, 1, 2, 2, 2…
## $ away_pts <dbl> 1, 0, 1, 1, 1, 3, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 2, 0, 3, 1…
## $ result <chr> "win", "win", "win", "win", "loss", "loss", "win", "loss",…
## $ season <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
Do National Women’s Soccer League (NWSL) teams have a home-field advantage? We’ll answer this question in a few separate ways.
Hypothesis testing framework: does NC Courage score a significantly different number of points (on average) away than at home?
location
that tells you whether the
courage are “home” or “away”pts
that always reports the Courage
points scored in a game.courage2
.# code here
To answer the question does NC Courage score a significantly different number of points (on average) away than at home?
Write the null and alternative hypothesis. Report the observed statistic.
Simulate under the null and visualize the null distribution.
Compute and report the p-value, compare to \(\alpha = 0.05\) and make a conclusion with appropriate context
# code here
set.seed(3)
and
reps=5000
Interpret your interval in context.# code here
Is there a better way we could investigate whether or not the Courage have a home-field advantage? Why?
Truth | Reject the null | Fail to reject the null |
---|---|---|
\(H_0\) is true | Type 1 error | ✔️ |
\(H_A\) is true | ✔️ | Type 2 error |
The significance level, \(\alpha\), is the probability of a type 1 error. In some contexts, a type 1 error may be referred to as a “false positive” and a type 2 error as a “false negative”.
Intuitively, by considering extremes, one can see a trade-off exists
between type 1 and type 2 error.
If \(\alpha = 0\), then the p-value
stands no chance of being smaller than \(\alpha\) and we always fail to reject the
null. This makes type 1 errors impossible.
Similarly, if \(\alpha = 1\), then all
p-values will be smaller than \(\alpha\) and type 2 errors will become
impossible, because we will always reject the null.
\(\beta\) is used to denote the probability of a type 2 error.
The power of a test is \(1 - \beta\), which is the probability that your test rejects the null hypothesis when the null hypothesis is false.
The data for this example comes from Confounding and Simpson’s paradox1 by Julious and Mullee.
The data examines 901 individuals with diabetes and includes the following variables
insulin_dep
: whether or not the patient has insulin
dependent or non-insulin dependent diabetesage
: whether or not the individual is less than 40
years oldsurvival
: whether or not the individual survived the
length of the studydiabetes = read_csv("https://sta101.github.io/static/appex/data/diabetes.csv")
Flex Aisher thinks people with insulin dependent diabetes actually survive longer than those without insulin dependence. Flex wants to formally test his hypothesis.
Let \(p_{d}\) be the probability of insulin dependent survival and \(p_{i}\) be the probability of insulin independent survival.
\[ H_0: p_{d} - p_{i} = 0\\ H_A: p_{d} - p_{i} > 0 \]
At first glance the data seem to back up his claim…
Compute the probability of survival and death for diabetic individuals with and without insulin dependence.
# code here
Is Flex’s claim significant at the \(\alpha = 0.05\) level? Perform a hypothesis test and report your results.
# code here
Is the aggregate data misleading? Use the code chunk below to investigate further.
# code here
Julious, S A, and M A Mullee. “Confounding and Simpson’s paradox.” BMJ (Clinical research ed.) vol. 309,6967 (1994): 1480-1. doi:10.1136/bmj.309.6967.1480↩︎