library("ISwR")
data(coking)
attach(coking)
boxplot(time ~ temp * width, data = coking, col = (c("gold", "darkgreen")),
main = "Time to process coke", xlab = "Temperature . Oven Size", ylab = "Time")
table(coking[, c("width", "temp")])
## temp
## width 1600 1900
## 4 3 3
## 8 3 3
## 12 3 3
Notation:
\( n_{ij} \) number of observation for level \( i \) of Temperature and level \( i \) of Temperature
\( n_{i.} \) number of observation for level \( i \) of Temperature
\( n_{.j} \) number of observation for level \( j \) of Width
\( n \) total nulber of observations
The experimental design is full, balanced with replications.
We have also:
\[ n_{ij} = \frac{n_{i.}n_{.j}}{n} \]
This property is known as orthogonality of the design.
We compare below the output of various model fitting
summary(lm(time ~ temp))
##
## Call:
## lm(formula = time ~ temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.311 -2.731 0.117 2.194 3.989
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.011 0.958 7.32 1.7e-06 ***
## temp1900 -1.956 1.355 -1.44 0.17
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.87 on 16 degrees of freedom
## Multiple R-squared: 0.115, Adjusted R-squared: 0.0599
## F-statistic: 2.08 on 1 and 16 DF, p-value: 0.168
anova(lm(time ~ temp))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 17.2 17.21 2.08 0.17
## Residuals 16 132.2 8.26
boxplot(time ~ temp, data = coking, col = "red", main = "Time to process coke",
xlab = "Temperature", ylab = "Time")
Disregardnig the potential effect of width makes us blind to the effect of temp.
summary(lm(time ~ width))
##
## Call:
## lm(formula = time ~ width)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.967 -0.983 0.167 0.800 1.933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.683 0.540 4.97 0.00017 ***
## width8 3.667 0.764 4.80 0.00023 ***
## width12 6.383 0.764 8.36 5e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.32 on 15 degrees of freedom
## Multiple R-squared: 0.824, Adjusted R-squared: 0.801
## F-statistic: 35.2 on 2 and 15 DF, p-value: 2.16e-06
anova(lm(time ~ width))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## width 2 123.1 61.6 35.2 2.2e-06 ***
## Residuals 15 26.2 1.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxplot(time ~ width, data = coking, col = "red", main = "Time to process coke",
xlab = "Width", ylab = "Time")
It can be dangerous to draw conclusions from separate analyses of the various factors.
summary(lm(time ~ temp + width))
##
## Call:
## lm(formula = time ~ temp + width)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.989 -0.618 -0.167 0.585 1.428
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.661 0.379 9.67 1.4e-07 ***
## temp1900 -1.956 0.379 -5.17 0.00014 ***
## width8 3.667 0.464 7.91 1.6e-06 ***
## width12 6.383 0.464 13.77 1.6e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.803 on 14 degrees of freedom
## Multiple R-squared: 0.94, Adjusted R-squared: 0.927
## F-statistic: 72.6 on 3 and 14 DF, p-value: 9e-09
anova(lm(time ~ temp + width))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 17.2 17.2 26.7 0.00014 ***
## width 2 123.1 61.6 95.5 6.9e-09 ***
## Residuals 14 9.0 0.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated effect and p-values relative to temperature are not the same under the models time~temp and
time~temp+width
##
anova(lm(time ~ temp))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 17.2 17.21 2.08 0.17
## Residuals 16 132.2 8.26
Perhaps not suprisingly lm(time~temp+width) and lm(time~width+temp) return striclty the same numerical output:
anova(lm(time ~ temp + width))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 17.2 17.2 26.7 0.00014 ***
## width 2 123.1 61.6 95.5 6.9e-09 ***
## Residuals 14 9.0 0.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(time ~ width + temp))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## width 2 123.1 61.6 95.5 6.9e-09 ***
## temp 1 17.2 17.2 26.7 0.00014 ***
## Residuals 14 9.0 0.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let us create an artifially un-balnced deisgn by dropping the first observation:
coking.unb = coking[-1, ]
attach(coking.unb)
## The following object(s) are masked from 'coking':
##
## temp, time, width
table(coking.unb[, c("width", "temp")])
## temp
## width 1600 1900
## 4 2 3
## 8 3 3
## 12 3 3
Doing so, we have lost the orthogonality property.
Perhaps not suprisingly lm(time~temp+width) and lm(time~width+temp) return striclty the same numerical output:
anova(lm(time ~ temp + width, data = coking.unb))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 24.3 24.3 35.1 5.0e-05 ***
## width 2 109.3 54.7 79.0 5.3e-08 ***
## Residuals 13 9.0 0.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(time ~ width + temp, data = coking.unb))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## width 2 117.1 58.6 84.7 3.5e-08 ***
## temp 1 16.4 16.4 23.8 3e-04 ***
## Residuals 13 9.0 0.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The lack of orthogonlality complicates the interpretation of analyses of variance.