---
title: "Issues in the interpretation of 2-way ANOVA model outputs"
output: slidy_presentation
---

## The coke data:

```r
library("ISwR")
data(coking)
attach(coking)
boxplot(time ~ temp * width, data = coking, col = (c("gold", "darkgreen")), 
    main = "Time to process coke", xlab = "Temperature . Oven Size", ylab = "Time")
```

![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1.png) 


## Experimental design

```r
table(coking[, c("width", "temp")])
```

```
##      temp
## width 1600 1900
##    4     3    3
##    8     3    3
##    12    3    3
```


Notation:

- $n_{ij}$ number of observation for level $i$ of Temperature and level $i$ of Temperature

- $n_{i.}$ number of observation for level $i$ of Temperature

- $n_{.j}$ number of observation for  level $j$ of Width

- $n$ total nulber of observations

The experimental design is full, balanced with replications. 

## Orthogonality

We have also:

$$ n_{ij} = \frac{n_{i.}n_{.j}}{n} $$

This property is known as orthogonality of the design.

## Testing effects of factors in an orthogonal experimental design

We compare below the output of various model fitting

##  time~temp

```r
summary(lm(time ~ temp))
```

```
## 
## Call:
## lm(formula = time ~ temp)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.311 -2.731  0.117  2.194  3.989 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.011      0.958    7.32  1.7e-06 ***
## temp1900      -1.956      1.355   -1.44     0.17    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 2.87 on 16 degrees of freedom
## Multiple R-squared: 0.115,	Adjusted R-squared: 0.0599 
## F-statistic: 2.08 on 1 and 16 DF,  p-value: 0.168
```

```r
anova(lm(time ~ temp))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value Pr(>F)
## temp       1   17.2   17.21    2.08   0.17
## Residuals 16  132.2    8.26
```


## Boxplot of time~temp ignoring the effect of width

```r
boxplot(time ~ temp, data = coking, col = "red", main = "Time to process coke", 
    xlab = "Temperature", ylab = "Time")
```

![plot of chunk unnamed-chunk-4](figure/unnamed-chunk-4.png) 

  Disregardnig the potential effect of width makes us blind to the effect of `temp`.      
        

##  time~width

```r
summary(lm(time ~ width))
```

```
## 
## Call:
## lm(formula = time ~ width)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.967 -0.983  0.167  0.800  1.933 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.683      0.540    4.97  0.00017 ***
## width8         3.667      0.764    4.80  0.00023 ***
## width12        6.383      0.764    8.36    5e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.32 on 15 degrees of freedom
## Multiple R-squared: 0.824,	Adjusted R-squared: 0.801 
## F-statistic: 35.2 on 2 and 15 DF,  p-value: 2.16e-06
```

```r
anova(lm(time ~ width))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## width      2  123.1    61.6    35.2 2.2e-06 ***
## Residuals 15   26.2     1.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```


## Boxplot of time~width ignoring the effect of temp

```r
boxplot(time ~ width, data = coking, col = "red", main = "Time to process coke", 
    xlab = "Width", ylab = "Time")
```

![plot of chunk unnamed-chunk-6](figure/unnamed-chunk-6.png) 


## Take-home message \# 1:

It can be dangerous to draw conclusions from separate analyses of the various factors.

##  time~temp+width

```r
summary(lm(time ~ temp + width))
```

```
## 
## Call:
## lm(formula = time ~ temp + width)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.989 -0.618 -0.167  0.585  1.428 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.661      0.379    9.67  1.4e-07 ***
## temp1900      -1.956      0.379   -5.17  0.00014 ***
## width8         3.667      0.464    7.91  1.6e-06 ***
## width12        6.383      0.464   13.77  1.6e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.803 on 14 degrees of freedom
## Multiple R-squared: 0.94,	Adjusted R-squared: 0.927 
## F-statistic: 72.6 on 3 and 14 DF,  p-value: 9e-09
```


##  time~temp+width

```r
anova(lm(time ~ temp + width))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## temp       1   17.2    17.2    26.7 0.00014 ***
## width      2  123.1    61.6    95.5 6.9e-09 ***
## Residuals 14    9.0     0.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The estimated effect and p-values relative to temperature are not the same under the models `time~temp` and 
`time~temp+width`

##

```r
anova(lm(time ~ temp))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value Pr(>F)
## temp       1   17.2   17.21    2.08   0.17
## Residuals 16  132.2    8.26
```


## Order of factors in the formula

Perhaps not suprisingly `lm(time~temp+width)` and  `lm(time~width+temp)` return striclty the same numerical output:

```r
anova(lm(time ~ temp + width))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## temp       1   17.2    17.2    26.7 0.00014 ***
## width      2  123.1    61.6    95.5 6.9e-09 ***
## Residuals 14    9.0     0.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r

anova(lm(time ~ width + temp))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## width      2  123.1    61.6    95.5 6.9e-09 ***
## temp       1   17.2    17.2    26.7 0.00014 ***
## Residuals 14    9.0     0.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```



## Analysis in un-balanced design:
Let us create an artifially un-balnced deisgn by dropping the first observation:

```r
coking.unb = coking[-1, ]
attach(coking.unb)
```

```
## The following object(s) are masked from 'coking':
## 
##     temp, time, width
```

```r
table(coking.unb[, c("width", "temp")])
```

```
##      temp
## width 1600 1900
##    4     2    3
##    8     3    3
##    12    3    3
```

Doing so, we have lost the orthogonality property. 

## Order of factors in the formula for unbalanced design

Perhaps not suprisingly `lm(time~temp+width)` and  `lm(time~width+temp)` return striclty the same numerical output:

```r
anova(lm(time ~ temp + width, data = coking.unb))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## temp       1   24.3    24.3    35.1 5.0e-05 ***
## width      2  109.3    54.7    79.0 5.3e-08 ***
## Residuals 13    9.0     0.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r

anova(lm(time ~ width + temp, data = coking.unb))
```

```
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## width      2  117.1    58.6    84.7 3.5e-08 ***
## temp       1   16.4    16.4    23.8   3e-04 ***
## Residuals 13    9.0     0.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```


## Take-home message \# 2: 
The lack of orthogonlality complicates the interpretation of analyses of variance.
