# organ donor

reference:
- Tintle, et al., ISI, example P.1, p.2
- stacked bar plots
- reshape

## data

Describe the data. What are the observational units in this study? What are the variables? Which variables are categorical? Which variables are quantitative?

SOLUTION:

``````df <- read.delim("OrganDonor.txt")
str(df)``````
``````## 'data.frame':    161 obs. of  2 variables:
##  \$ Default: Factor w/ 3 levels "neutral","opt-in",..: 2 2 2 2 2 2 2 2 2 2 ...
##  \$ Choice : Factor w/ 2 levels "donor","not": 1 1 1 1 1 1 1 1 1 1 ...``````
``table(df)``
``````##          Choice
## Default   donor not
##   neutral    44  12
##   opt-in     23  32
##   opt-out    41   9``````

### data in wide format

``````df.wide <- df %>%
group_by(Default) %>%
summarize(n = n(),
n.donor = sum(Choice == "donor"),
p.donor = n.donor / n,
n.not = sum(Choice == "not"),
p.not = n.not / n)
df.wide``````
``````## # A tibble: 3 x 6
##   Default     n n.donor   p.donor n.not     p.not
##    <fctr> <int>   <int>     <dbl> <int>     <dbl>
## 1 neutral    56      44 0.7857143    12 0.2142857
## 2  opt-in    55      23 0.4181818    32 0.5818182
## 3 opt-out    50      41 0.8200000     9 0.1800000``````

### data in long format

``````default <- rep(c("neutral", "opt-in", "opt-out"), 2)
donor <- c(rep("donor", 3), rep("not", 3))
donor <- factor(donor, levels = c("not", "donor"))
p <- c(df.wide\$p.donor, df.wide\$p.not)
df.long <- data.frame(default, donor, p)
df.long``````
``````##   default donor         p
## 1 neutral donor 0.7857143
## 2  opt-in donor 0.4181818
## 3 opt-out donor 0.8200000
## 4 neutral   not 0.2142857
## 5  opt-in   not 0.5818182
## 6 opt-out   not 0.1800000``````
``str(df.long)``
``````## 'data.frame':    6 obs. of  3 variables:
##  \$ default: Factor w/ 3 levels "neutral","opt-in",..: 1 2 3 1 2 3
##  \$ donor  : Factor w/ 2 levels "not","donor": 2 2 2 1 1 1
##  \$ p      : num  0.786 0.418 0.82 0.214 0.582 ...``````

### convert from long to wide format

Software can reorganize such tables.

Use `dcast` to pull \(choice\) up to a column variable.

``dcast(df.long, default ~ donor, value.var = "p")``
``````##   default       not     donor
## 1 neutral 0.2142857 0.7857143
## 2  opt-in 0.5818182 0.4181818
## 3 opt-out 0.1800000 0.8200000``````

## barplots

Which type of barplot is more appropriate for this data: stacked barplot or comparative barplots? Why?

SOLUTION:

### stacked barplot

``````palette1 <- c("wheat", "saddlebrown")
ggplot(df.long, aes(x = default, donor, y = p, fill = donor)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = palette1) +
labs(title = "Organ Donor")``````