--- title: "clothing" author: "Chris Parrish" date: "January 9, 2016" output: pdf_document --- clothing references: - Cannon, et al., Stat2, chapter 03, example 3.17 Import the data. ```{r} data <- read.csv("Clothing.csv", header=TRUE) head(data) dim(data) ``` Scatterplot matrix. ```{r} pairs(~ Amount + Recency + Freq12 + Dollar12 + Freq24 + Dollar24, data=data, col="darkred") ``` Clean data. ```{r} data <- data[-c(1:3, 60), ] # remove 4 records pairs(~ Amount + Recency + Freq12 + Dollar12 + Freq24 + Dollar24, data=data, col="darkred") dim(data) ``` Matrix of correlation coefficients. ```{r} with(data, round(cor(cbind(Amount, Recency, Freq12, Dollar12, Freq24, Dollar24)), 3)) ``` Simple linear regression. ```{r} clothing.lm1 <- lm(Amount ~ Dollar12, data=data) options(show.signif.stars=FALSE) summary(clothing.lm1) plot(Amount ~ Dollar12, data=data, pch=20, col="darkred") abline(clothing.lm1, col="orange") ``` Multiple linear regression. ```{r} clothing.lm2 <- lm(Amount ~ Dollar12 + Dollar24 + Recency, data=data) summary(clothing.lm2) ``` Full model. ```{r} clothing.lm3 <- lm(Amount ~ Freq12 + Dollar12 + Freq24 + Dollar24 + Recency + Card, data=data) summary(clothing.lm3) ``` Another model, balancing simplicity and explanatory power. ```{r} clothing.lm4 <- lm(Amount ~ Dollar12 + Freq12, data=data) summary(clothing.lm4) ``` Create a new regressor. ```{r} data <- data[data\$Freq12 != 0, ] dim(data) data\$AvgSpent12 <- with(data, Dollar12 / Freq12) head(data) clothing.lm5 <- lm(Amount ~ AvgSpent12, data=data) summary(clothing.lm5) plot(Amount ~ AvgSpent12, data=data, pch=20, col="darkred") abline(clothing.lm5, col="orange") ``` Residuals for the new regressor. ```{r} plot(predict(clothing.lm5), resid(clothing.lm5), pch=20, col="darkred") abline(h=0, col="orange", lty="dashed") ``` Add a quadratc term. ```{r} clothing.lm6 <- lm(Amount ~ AvgSpent12 + I(AvgSpent12^2), data=data) summary(clothing.lm6) plot(predict(clothing.lm6), resid(clothing.lm6), pch=20, col="darkred") abline(h=0, col="orange", lty="dashed") qqnorm(resid(clothing.lm6), col="turquoise") qqline(resid(clothing.lm6), col="orange") ``` Final model. \$\widehat{Amount} =\$ `r round(coef(clothing.lm6)[1], 3)` + `r round(coef(clothing.lm6)[2], 3)` \$AvgSpent12\$ + `r round(coef(clothing.lm6)[3], 3)` \$AvgSpent12^2\$ ```{r} plot(Amount ~ AvgSpent12, data=data, pch=20, col="darkred") amount <- function(avgSpent12){ a <- 14.02 b <- 0.5709 c <- 0.002289 amt <- a + b * avgSpent12 + c * avgSpent12^2 return(amt) } curve(amount, from=0, to=400, col="orange", add=TRUE) ``` Effect plot. ```{r message=FALSE} library(alr4) plot(effect("AvgSpent12", clothing.lm6)) ```