STA 101 Final Project

This final project has three deliverable items:

In addition to the above, a component of the grade will be peer-reviewing project report drafts during lab in the final week of class and commenting on the presentaitons on August 5th.

View team assignments here

About the project

Find a data set, develop a question you can answer with the data, and do it.

Proposal 5pts

  1. Find 2 or 3 data sets of interest. Each data must have a mix of categorical and numeric variables and contain at least 500 observations and 10 variables or have prior approval by Prof. Fisher.

  2. Identify the source of the data, when and how it was originally collected (by the curator, not necessarily how you found the data) and a brief description of the observations.

  3. Identify a research question you can answer with each data set (and which variables will help you answer the question!)

  4. At the end of your document, provide a glimpse() of each data set.

Your proposal should be no longer than 1 page (not including the glimpses). After you submit your proposal, I will offer feedback and help you decide which data set to choose for the final project. For this reason, please rank your proposal data sets with your favorite first.

Where to find data?

Some example resources you might use to find a data set are below. You may not use a data set used in this class or another class.

Written report 50pts

Your report must be written using R Markdown. Your written report should not exceed ten pages inclusive of all tables and figures. Use the code below to download a template file for the project.

download.file("https://sta101.github.io/static/projects/final_project_template.Rmd",
destfile = "finalProject.rmd")

To begin, add YAML to the top and specify a project name, a team name (optional) and the names of each group member. You can use the YAML posted below as a template.

---
title: "Final project"
author: "The Last Rbenders: Aang, Katara, Sokka, Momo"
---

All team members must contribute to the report. Before you finalize your report, make sure the printing of code chunks is turned off by including the following code chunk at the top of your RMD:

```{r setup, include=FALSE}
    knitr::opts_chunk$set(echo = FALSE)
```

Next, load any relevant libraries and the data.

The written report is worth 50 points, broken down as follows:

Introduction 7pts

The introduction provides motivation and context for your research.

To begin, introduce the data set in a few short sentences. Next, create a code book (aka a “data dictionary”) of the variables in the data set.

Complete the introduction by providing a concise, clear statement of your research question and hypotheses. Be sure to motivate why the research question is interesting / useful.

Example research question and hypotheses:

Can we predict body mass with bill depth? We hypothesize that penguins with deeper bills will also have more mass.

Methodology 15pts

Here you should introduce any statistical methods you use and describe why you choose the methods you do to answer your question. You might also include any preliminary summary statistics or figures you use to explore the data.

Results 15pts

Place figure(s) here to illustrate the main results from your analysis. 1 beautiful figure is worth more than several poorly formatted figures. You must have at least 1 figure.

Provide only the main results from your analysis. The goal is not to do an exhaustive data analysis (calculate every possible statistic and create every possible model for all variables). Rather, you should demonstrate that you are proficient at asking meaningful questions and answering them using data, that you are skilled in writing about and interpreting results, and that you can accomplish these tasks using R. More is not better.

Discussion 8pts

This section is a conclusion and discussion. You should

  1. Summarize your main finding in a sentence or two.

  2. Discuss your finding and why it is useful (put in the context of your motivation from the introduction).

  3. Critique your own analyses and include a brief paragraph on what you would do differently if you were able to start the project over.

  4. List a brief (1 or 2 sentence) summary of the relative contributions of each team member. E.g. “Aang built the models, Katara implemented them in R, and Sokka wrote the introduction and discussion.”

Formatting 5pts

Your written report should be professionally formatted. This means complete sentences, labeling graphs and figures, turning off code chunks, and using typical style guidelines. The only sections your reportm ay contain are Introduction, Methodology, Results and Discussion. You should include a citation of your data set and the citation should be formatted in any style of your choosing (e.g. MLA, APA etc.) It is important that your citations (should you include multiple) be consistent in their formatting.

Peer review 2.5pts

During lab in the final week, you will peer-review draft reports. Details will be announced in lab.

Presentation 40pts

For your presentation, you and your team must also create presentation slides that summarize and showcase your project. Introduce your research question and data set, showcase visualizations, and provide some conclusions. These slides should serve as a brief visual accompaniment to your write-up and will be graded for content and quality.

The slide deck should have no more than 6 content slides + 1 title slide. Here is a suggested outline as you think through the slides; you do not have to use this exact format for the slide deck.

For the presentation, you can speak over your slide deck, similar to the lecture content videos. I recommend using Zoom to record your presentation; however, you can use whatever platform works best for your group. Below are a few resources to help you record video presentations:

You can post the link directly into sakai discussion or alternatively post the presentation video in Warpwire, which is accessible from the the course Sakai site (bottom of the left-hand tool bar).

To upload your video to Warpwire:

To post the video to the discussion forum:

Presentation comments 2.5pts

Each student will be assigned 2 presentations to watch.

Watch the group’s video, then click “Reply” to post a question for the group. You may not post a question that’s already been asked on the discussion thread. Additionally, the question should be (i) substantive (i.e. it shouldn’t be “Why did you use a bar plot instead of a pie chart”?), (ii) demonstrate your understanding of the content from the course, and (iii) relevant to that group’s specific presentation, i.e demonstrating that you’ve watched the presentation.

Questions must be posted by Friday August 5.

This portion of the project will be assessed individually

Tip

Submitting to gradescope