The coke data:

library("ISwR")
data(coking)
attach(coking)
boxplot(time ~ temp * width, data = coking, col = (c("gold", "darkgreen")), 
    main = "Time to process coke", xlab = "Temperature . Oven Size", ylab = "Time")

plot of chunk unnamed-chunk-1

Experimental design

table(coking[, c("width", "temp")])
##      temp
## width 1600 1900
##    4     3    3
##    8     3    3
##    12    3    3

Notation:

The experimental design is full, balanced with replications.

Orthogonality

We have also:

\[ n_{ij} = \frac{n_{i.}n_{.j}}{n} \]

This property is known as orthogonality of the design.

Testing effects of factors in an orthogonal experimental design

We compare below the output of various model fitting

time~temp

summary(lm(time ~ temp))
## 
## Call:
## lm(formula = time ~ temp)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.311 -2.731  0.117  2.194  3.989 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.011      0.958    7.32  1.7e-06 ***
## temp1900      -1.956      1.355   -1.44     0.17    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 2.87 on 16 degrees of freedom
## Multiple R-squared: 0.115,   Adjusted R-squared: 0.0599 
## F-statistic: 2.08 on 1 and 16 DF,  p-value: 0.168
anova(lm(time ~ temp))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value Pr(>F)
## temp       1   17.2   17.21    2.08   0.17
## Residuals 16  132.2    8.26

Boxplot of time~temp ignoring the effect of width

boxplot(time ~ temp, data = coking, col = "red", main = "Time to process coke", 
    xlab = "Temperature", ylab = "Time")

plot of chunk unnamed-chunk-4

Disregardnig the potential effect of width makes us blind to the effect of temp.

time~width

summary(lm(time ~ width))
## 
## Call:
## lm(formula = time ~ width)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.967 -0.983  0.167  0.800  1.933 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.683      0.540    4.97  0.00017 ***
## width8         3.667      0.764    4.80  0.00023 ***
## width12        6.383      0.764    8.36    5e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.32 on 15 degrees of freedom
## Multiple R-squared: 0.824,   Adjusted R-squared: 0.801 
## F-statistic: 35.2 on 2 and 15 DF,  p-value: 2.16e-06
anova(lm(time ~ width))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## width      2  123.1    61.6    35.2 2.2e-06 ***
## Residuals 15   26.2     1.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Boxplot of time~width ignoring the effect of temp

boxplot(time ~ width, data = coking, col = "red", main = "Time to process coke", 
    xlab = "Width", ylab = "Time")

plot of chunk unnamed-chunk-6

Take-home message # 1:

It can be dangerous to draw conclusions from separate analyses of the various factors.

time~temp+width

summary(lm(time ~ temp + width))
## 
## Call:
## lm(formula = time ~ temp + width)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.989 -0.618 -0.167  0.585  1.428 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.661      0.379    9.67  1.4e-07 ***
## temp1900      -1.956      0.379   -5.17  0.00014 ***
## width8         3.667      0.464    7.91  1.6e-06 ***
## width12        6.383      0.464   13.77  1.6e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.803 on 14 degrees of freedom
## Multiple R-squared: 0.94,    Adjusted R-squared: 0.927 
## F-statistic: 72.6 on 3 and 14 DF,  p-value: 9e-09

time~temp+width

anova(lm(time ~ temp + width))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## temp       1   17.2    17.2    26.7 0.00014 ***
## width      2  123.1    61.6    95.5 6.9e-09 ***
## Residuals 14    9.0     0.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimated effect and p-values relative to temperature are not the same under the models time~temp and time~temp+width

##

anova(lm(time ~ temp))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value Pr(>F)
## temp       1   17.2   17.21    2.08   0.17
## Residuals 16  132.2    8.26

Order of factors in the formula

Perhaps not suprisingly lm(time~temp+width) and lm(time~width+temp) return striclty the same numerical output:

anova(lm(time ~ temp + width))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## temp       1   17.2    17.2    26.7 0.00014 ***
## width      2  123.1    61.6    95.5 6.9e-09 ***
## Residuals 14    9.0     0.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm(time ~ width + temp))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## width      2  123.1    61.6    95.5 6.9e-09 ***
## temp       1   17.2    17.2    26.7 0.00014 ***
## Residuals 14    9.0     0.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis in un-balanced design:

Let us create an artifially un-balnced deisgn by dropping the first observation:

coking.unb = coking[-1, ]
attach(coking.unb)
## The following object(s) are masked from 'coking':
## 
##     temp, time, width
table(coking.unb[, c("width", "temp")])
##      temp
## width 1600 1900
##    4     2    3
##    8     3    3
##    12    3    3

Doing so, we have lost the orthogonality property.

Order of factors in the formula for unbalanced design

Perhaps not suprisingly lm(time~temp+width) and lm(time~width+temp) return striclty the same numerical output:

anova(lm(time ~ temp + width, data = coking.unb))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## temp       1   24.3    24.3    35.1 5.0e-05 ***
## width      2  109.3    54.7    79.0 5.3e-08 ***
## Residuals 13    9.0     0.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm(time ~ width + temp, data = coking.unb))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## width      2  117.1    58.6    84.7 3.5e-08 ***
## temp       1   16.4    16.4    23.8   3e-04 ***
## Residuals 13    9.0     0.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Take-home message # 2:

The lack of orthogonlality complicates the interpretation of analyses of variance.