--- title: "diamonds" author: "Chris Parrish" date: "January 8, 2016" output: pdf_document --- diamonds references: - Cannon, et al., Stat2, chapter 03, examples 3.11, 3.15 Import the data. ```{r} data <- read.csv("Diamonds.csv", header=TRUE) head(data) dim(data) ``` Scatterplot matrix. ```{r} pairs(~ TotalPrice + Carat + I(Carat^2), data=data, col="darkred") ``` Price ~ Carat ```{r} plot(TotalPrice ~ Carat, data=data, pch=20, col="darkred") ``` Quadratic linear model. ```{r} diamonds.lm <- lm(TotalPrice ~ Carat + I(Carat^2), data=data) ``` \$\widehat{TotalPrice} =\$ `r round(coef(diamonds.lm)[1], 3)` + `r round(coef(diamonds.lm)[2], 3)` \$Carat\$ + `r round(coef(diamonds.lm)[3], 3)` \$Carat^2\$ ```{r} options(show.signif.stars=FALSE) summary(diamonds.lm) ``` Illustration. ```{r} plot(TotalPrice ~ Carat, data=data, pch=20, col="darkred") diamondPrice <- function(carat){ a <- -522.7 b <- 2386.0 c <- 4498.2 price <- a + b * carat + c * carat^2 return(price) } curve(diamondPrice, from=0.3, to=3.3, col="orange", add=TRUE) ``` Residuals ```{r} hist(resid(diamonds.lm), col="wheat") qqnorm(resid(diamonds.lm), col="orchid") qqline(resid(diamonds.lm), col="orange") plot(predict(diamonds.lm), resid(diamonds.lm), pch=20, col="darkred") abline(h=0, col="orange") ``` VIF = variance inflation factor \$VIF_i > 5\$ implies that \$R_i^2 > 0.80\$, so the \$i\$th variable is largely explained by the other variables. ```{r} library(car) diamonds.lm2 <- lm(TotalPrice ~ Carat + I(Carat^2) + Depth, data=data) summary(diamonds.lm2) vif(diamonds.lm2) ```