Rcode: Two-way ANOVA


Analyze the bread data

  • sales: number of cases of bread sold.
  • height: height of the shelf display in 3 levels: bottom, middle, top.
  • width: width of the shelf display in levels: regular, wide.
sales height width
 47 1 1
 43 1 1
 46 1 2
 40 1 2
 62 2 1
 68 2 1
 67 2 2
 71 2 2
 41 3 1
 39 3 1
 42 3 2
 46 3 2

Questions

  • Does the height of the display affect sales? If yes, compare top with middle, top with bottom, and middle with bottom.
  • Does the width of the display affect sales?
  • Does the effect of height on sales depend on the width? Does the effect of width on sales depend on the height? “Yes” means that there is an interaction. If yes to the last two, that is an interaction.
bread = read.table("bread.dat", header=TRUE)
bread
bread$height = as.factor(bread$height)
bread$width = as.factor(bread$width)
height.labels = c("bottom", "middle", "top")
width.labels = c("regular", "wide")
attach(bread)
table(height, width)

Interaction plots: either use the basic “interaction.plot” command or ggplot

par(mfrow=c(1,2))
interaction.plot(height, width, sales)
interaction.plot(width, height, sales)

library(dplyr)
library(ggplot2)
bread %>% group_by(height, width) %>% summarise(sales=mean(sales))
tmpdata = bread %>% group_by(height, width) %>% summarise(sales=mean(sales))

tmpdata$height=factor(tmpdata$height, labels=height.labels)
tmpdata$width = factor(tmpdata$width,labels=width.labels)

ggplot(tmpdata, 
     aes(x=height, y=sales, colour=width, group=width)) +
     geom_line(aes(linetype=width), size=1) + 
     geom_point(aes(shape=width), size=2)

ggplot(tmpdata, 
     aes(x=width, y=sales, colour=height, group=height)) +
     geom_line(aes(linetype=height), size=1) + 
     geom_point(aes(shape=height), size=2)

Both the F-tests and the interaction plots indicate that only “height” is significant.

anova(lm(sales ~ height*width, bread))

Refit the data using a one-way ANOVA model. We use the “-1” option to remove the intercept, so the three LS coefficients correspond to the group means of the three groups. For regression model without an intercept, R computes the TSS in a different way, so DO NOT read the overall F-test and R-square from “myfit”.

myfit = lm(sales ~ height-1, bread)
summary(myfit)

Suppose we want to test the following two orthogonal contrasts:

  • difference between height =1 and 3
  • difference between height = 2 and the average of height = 1, 3
library(multcomp)   ## adjust for multiple comparison
K = rbind(c(1, 0, -1), c(1/2, -1, 1/2))
mytest = glht(myfit, linfct=K)
summary(mytest)

confint(mytest)

## Half CI width with Bonferroni correction
mysd = sqrt(diag(vcov(mytest)))
qt(1-0.025/2, 9)*mysd 
 
## Half CI width with Sheffe's correction
sqrt(2*qf(0.95, 2, 9))*mysd  

## Tukey intervals 
mytest = glht(myfit, linfct=mcp(height="Tukey"))
confint(mytest)


## Check the half-width for Tukey intervals
qtukey(0.95,3,9)*summary(myfit)$sigma*sqrt(1/4)
myint = confint(mytest)$confint
myint[,1]-myint[,2]

## You can specify the contrasts directly. 
K = rbind(c(1,-1,0), c(1, 0, -1), c(0, 1, -1))
mytest = glht(myfit, linfct=K)
summary(mytest)

confint(mytest)

Simulated data, where row and column effects are confounded.

  • Example 1: the true model involves just the row effect. Both main effects are significant individually, and the additive model is not significant.
    n0=10; m=1
    
    X1 = c(rep('A', n0), rep('B', n0)) # drug A and B
    X2 = c(rep('M', n0-m), rep('F', m), rep('M', m), rep('F', n0-m))
    
    a=c(1,0);
    y = rep(a, each=n0) + rnorm(2*n0)
    
    mydata = data.frame(y=y, 
                        drug=as.factor(X1), 
                        gender=as.factor(X2))
    
    anova(lm(y ~ X1*X2, mydata))
    anova(lm(y ~ X2*X1, mydata))
    anova(lm(y~1, mydata), lm(y~X1+X2, mydata))
    
    ## Repeat this experiment with m=5, which is a balanced case
    
  • Example 2: the true model is an additive, but the additive model is not significant although both main effects are significant individually.
    m=1; n=20;
    
    X1 = c(rep('A', m+m), rep('B', n-m-m)) # drug A and B
    X2 = c(rep('M', m), rep('F', m), rep('M', m), rep('F', n-3*m))
    
    a = c(1,0); b=c(0,1); 
    y = c(rep(a[1]+b[1],m), 
         rep(a[1]+b[2], m), 
         rep(a[2]+b[1], m), 
        rep(a[2]+b[2], n-3*m)) + rnorm(n)
    
    mydata = data.frame(y=y, 
                        drug=as.factor(X1), 
                        gender=as.factor(X2))
    
    anova(lm(y ~ X1*X2, mydata))
    anova(lm(y ~ X2*X1, mydata))
    anova(lm(y~1, mydata), lm(y~X1+X2, mydata))