## Elementary Statistics

Part 1: Gathering and Exploring Data

#### Sections

• 1. Statistics: The Art and Science of Learning from Data (pdf)
• 2. Exploring Data with Graphical and Numerical Summaries (pdf)
• 3. Association: Contingency, Correlation, and Regression (pdf)
• 4. Gathering Data (pdf)

• Review of Part 1. Gathering and Exploring Data (pdf)

#### Exploratory Data Analysis

• Minard's representation of Napoleon's invasion of Russia, 1812-1813 (pdf)
• Florence Nightingale's rose diagrams, Crimean War, 1854-1856 (pdf)
• hurricanes (pdf r)
• using r as a calculator (pdf r)
• uniform distribution on [0, 1] (pdf r pdf )
• standard normal distribution (pdf r pdf)
• plotting quantitative data (pdf r)
• graphics (r)
• numerical summaries (r)
• flipping a fair coin (pdf r)
• questionnaire ... student data (pdf rmd html)
• questionnaire ... why? (rmd html)

#### Simulation

• exit polls (pdf), 1.22 exit polls (r), 1.23 exit polls (r)
• runs in basketball (pdf)
• collecting action figures (pdf)
• scaling grades (pdf)

#### Guidelines for Submitting Homework

• Use a word processor to prepare your homework.
• Include all required analyses and graphics.
• Include ALL R code that you used to generate any statistics or graphics that you have included in your report (and this even includes the R code which is being supplied to help you do your homework). This is called reproducible research, and it is becoming a very important element of modern science. The idea is that other investigators can readily reproduce your results.
• Give explicit credit to any outside resources that you used to prepare your work. This is called intellectual honesty.
• Many students find that working in small groups is a big help in boosting their energy levels, and you are encouraged to do so, but at the final moment when your fingers hit the keyboard it becomes an individual effort. All submitted homework must strictly be your own personal work, apart from properly acknowledged additional sources.
• Homework is ALWAYS due on Friday, and just what is due on a particular Friday can be calculated from the class schedule by looking for the material covered in the three previous boxes.
• You are ALWAYS free to take advantage of an automatic extension to the immediately following Monday, to accommodate official team trips to other cities, unexpected illness, or the occasional sleeping for two days instead of one.
• Homework submitted after the relevant automatic extension date will not contribute to your grade.
• Method of submission : You can always submit printed homework, just like the old-timers used to do, but let's see if we can't save a few trees by borrowing a few electrons instead. Make a pdf file of your completed homework, and save it with a name like Stat204 Jane Doe hw chap 2.pdf. Now email that file to me at cparrish@sewanee.edu with a subject line like Stat204 Jane Doe hw chap 2. My email client will catch it, notice the symbol Stat204, and sort it into the homework folder for this class.
• There are two main tracks for making progress towards our goal of becoming statistically sophisticated : reading the text and doing the homework exercises. Much of the learning happens as you sort through your newly formed statistical ideas, seeking to construct an analysis appropriate to the exercise at hand. The value is not confined to obtaining a correct final result. Here is an interesting statistical hypothesis that we might be able to examine as we move further into this course : The quality of homework submissions is an accurate predictor of exam grades. Hmmm. Maybe we can test that hypothesis.

#### RStudio Demo

Place the following files in a folder named "RStudio demo (EU)" located on your desktop. Launch RStudio, and set R's working directory to be the folder "RStudio demo (EU)." Now take a tour of RStudio using these materials from a recent exercise, 2.61 EU.

#### Data (for selected hw exercises)

• AFS data sets (zip)
• Chapter 1 (r)
• Chapter 2 (r)
• Chapter 3 (r)
• Chapter 4 (r)
• Part 1 (r)

#### Homework

• Chapter 1 -- 1.1 (aspirin), 1.12 (age), 1.19 (Ann Landers), 1.22 (exit poll), 1.23 (unusual?), 1.26 (presidential popularity), 1.27 (Brown vs. Whitman)
• Chapter 2 -- 2.4 (categorical or quantitative?), 2.6 (discrete or continuous?), 2.14 (sugar), 2.17 (fertility), 2.27 (Central Park temperatures), 2.28 (whooping cough), 2.40 (net worth), 2.60 (skew), 2.61 (EU), 2.68 (taxes), 2.76 (energy), 2.78 (air pollution)
• Chapter 3 -- 3.3 (happy), 3.5 (alcohol), 3.9 (gender), 3.17 (r), 3.21 (bikes), 3.22 (sandwiches), 3.39 (study time), 3.41 (bikes), 3.42 (bikes), 3.57 (education), 3.60 (confounder?)
• Chapter 4 -- 4.2 (blood pressure), 4.6 (Nurses' Health Study and Women's Health Initiative), 4.20 (polls), 4.31 (bias), 4.37 (smoking), 4.42 (vitamin C), 4.43 (blood pressure), 4.53 (caffeine), 4.54 (allergy), 4.55 (smoking)

#### Review Exercises

Consolidate your understanding of the concepts in this chapter by working through a number of these exercises.

Review exercises and problems for part 1
• Chapter 2 -- 2.105 (golf), 2.109 (s), 2.110 (heights), 2.114 (tax), 2.115 (cereal), 2.118 (teacher's salaries), 2.119 (health insurance), 2.121 (graduation), 2.122 (SAT), 2.123 (blood pressure), 2.124 (sodium)
• Chapter 3 -- 3.69 (OECD), 3.70 (dust), 3.72 (crime), 3.73 (height), 3.77 (cars), 3.78 (Internet), 3.81 (women), 3.89 (graduation),
• Chapter 4 -- 4.62 (NCAA), 4.68 (poem, Rudyard Kipling's If), 4.70 (Physician's Health Study), 4.71 (aspirin), 4.72 (exercise), 4.73 (smoking), 4.75 (buproprion)
• Review Part 1 -- R1.2 (housework), R1.7 (newspapers), R1.8 (gender), R1.9 (workforce), R1.12 (holiday time), R1.15 (water), R1.16 (energy), R1.20 (iris), R1.22 (education), R1.23 (poverty), R1.24 (TV), R1.26 (ginger), R1.29 (taxes, bias)

cparrish@sewanee.edu