one mean

Students found that the average height of a group of 120 randomly chosen female Sewanee students between the ages of 20 and 29 was \(\bar{x} = 63.8\) inches, with a sample standard deviation of \(s = 2.06\) inches. Does that differ significantly from the heights of all American women in the same age group?

references:
- Anthropometric Reference Data for Children and Adults: United States, 2011–2014, CDC

library(tidyverse)

observed sample statistic

x.bar.observed <- 63.8
s.observed <- 2.06
alpha <- 0.05

simulation

The CDC reports that the average height of American women between the ages of 20 and 29 is 64.1 inches, with a standard deviation of 2.25 inches.

Design an experiment:

generate 120 representative heights of American women in this age range … report the average height of the sample

n <- 120               # sample size
mu <- 64.1             # mean height (in.)
sigma <- 2.25          # standard deviation (in.)
height120 <- function(){
  samp <- rnorm(n, mean = mu, sd = sigma)
  x.bar <- mean(samp)
  return(x.bar)
}

repeat 10 times

replicate(10, height120())
##  [1] 63.78992 64.46217 64.37326 63.98532 64.30370 63.81543 64.09789
##  [8] 63.99719 63.87410 63.90026

simulated sampling distribution of \(\bar{x}\)

Assume that the population standard deviation of the heights of these women is known to be \(\sigma\).

Verify requirements:

We are assuming that the distribution of heights of American women in this age range is (at least approximately) normal, which means that the sampling distribution of \(\bar{x}\) is also normal, and in any case the sample size \(n = 120\) is greater than 30, so the sampling distribution of \(\bar{x}\) would be approximately normal for that reason as well.

Expected result:

Therefore, we would expect the sampling distribution of \(\bar{x}\) to be (1) approximately normal, with (2) mean \[\mu_{\bar{x}} = \mu = 64.1\] and standard error \[{SE}_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = 0.2054\]

n.trials <- 1e4
df1 <- data.frame(x.bar = replicate(n.trials, height120()))
str(df1)
## 'data.frame':    10000 obs. of  1 variable:
##  $ x.bar: num  64.4 63.9 64.1 64.1 64.2 ...
sampling.distribution1(df1)