ANOVA Assumptions - R#

Shapiro-Wilk Test (on responses)#

  • Assumption: Normality

  • Context of Use: t-test, ANOVA, LM, LMM

  • Reporting: “To test the assumption of conditional normality, a Shapiro-Wilk test was run on the response Y for each combination of levels of factors X1 and X2. All combinations were found to be statistically non-significant except condition (b,b), which showed a statistically significant deviation from normality (W = .794, p < .01).”

# Example data
# df has two factors (X1,X2) each w/two levels (a,b) and continuous response Y
df <- read.csv("data/2F2LBs_normal.csv")
head(df, 20)
A data.frame: 20 × 4
SX1X2Y
<int><chr><chr><dbl>
1 1aa11.867340
2 2ab15.453745
3 3ba15.782415
4 4bb13.626554
5 5aa11.608195
6 6ab19.540674
7 7ba15.878829
8 8bb16.654611
9 9aa17.370245
1010ab17.547025
1111ba19.423109
1212bb16.369147
1313aa22.896627
1414ab14.037504
1515ba15.564130
1616bb16.109281
1717aa 7.028036
1818ab21.508059
1919ba17.644257
2020bb16.054999
## Shapiro-Wilk conditional normality test
# (on the response within each condition)
shapiro.test(df[df$X1 == "a" & df$X2 == "a",]$Y) # condition a,a
shapiro.test(df[df$X1 == "a" & df$X2 == "b",]$Y) # condition a,b
shapiro.test(df[df$X1 == "b" & df$X2 == "a",]$Y) # condition b,a
shapiro.test(df[df$X1 == "b" & df$X2 == "b",]$Y) # condition b,b
	Shapiro-Wilk normality test

data:  df[df$X1 == "a" & df$X2 == "a", ]$Y
W = 0.97554, p-value = 0.93
	Shapiro-Wilk normality test

data:  df[df$X1 == "a" & df$X2 == "b", ]$Y
W = 0.93417, p-value = 0.3147
	Shapiro-Wilk normality test

data:  df[df$X1 == "b" & df$X2 == "a", ]$Y
W = 0.98454, p-value = 0.9914
	Shapiro-Wilk normality test

data:  df[df$X1 == "b" & df$X2 == "b", ]$Y
W = 0.7938, p-value = 0.003064

Shapiro-Wilk Test (on residuals)#

  • Assumption: Normality

  • Context of Use: t-test, ANOVA, LM, LMM

  • Reporting: “To test the normality assumption, a Shapiro-Wilk test was run on the residuals of a between-subjects full-factorial ANOVA model. The test was statistically non-significant (W = .988, p = .798), indicating compliance with the normality assumption. A Q-Q plot of residuals visually confirms the same (Figure 1).”

# Example data
# df has two factors (X1,X2) each w/two levels (a,b) and continuous response Y
df <- read.csv("data/2F2LBs_normal.csv")
head(df, 20)
A data.frame: 20 × 4
SX1X2Y
<int><chr><chr><dbl>
1 1aa11.867340
2 2ab15.453745
3 3ba15.782415
4 4bb13.626554
5 5aa11.608195
6 6ab19.540674
7 7ba15.878829
8 8bb16.654611
9 9aa17.370245
1010ab17.547025
1111ba19.423109
1212bb16.369147
1313aa22.896627
1414ab14.037504
1515ba15.564130
1616bb16.109281
1717aa 7.028036
1818ab21.508059
1919ba17.644257
2020bb16.054999
m = aov(Y ~ X1*X2, data=df) # make anova model
shapiro.test(residuals(m))
	Shapiro-Wilk normality test

data:  residuals(m)
W = 0.98751, p-value = 0.7979
par(mfrow=c(1,1))
qqnorm(residuals(m)); qqline(residuals(m)) # Q-Q plot
../../_images/anova-r_7_0.png

Anderson-Darling Test (on responses)#

  • Assumption: Normality

  • Context of Use: t-test, ANOVA, LM, LMM

  • Reporting: “To test the assumption of conditional normality, an Anderson-Darling test was run on the response Y for each combination of levels of factors X1 and X2. All combinations were found to be statistically non-significant except condition (b,b), which showed a statistically significant deviation from normality (A = 1.417, p < .001).”

# Example data
# df has two factors (X1,X2) each w/two levels (a,b) and continuous response Y
df <- read.csv("data/2F2LBs_normal.csv")
head(df, 20)
A data.frame: 20 × 4
SX1X2Y
<int><chr><chr><dbl>
1 1aa11.867340
2 2ab15.453745
3 3ba15.782415
4 4bb13.626554
5 5aa11.608195
6 6ab19.540674
7 7ba15.878829
8 8bb16.654611
9 9aa17.370245
1010ab17.547025
1111ba19.423109
1212bb16.369147
1313aa22.896627
1414ab14.037504
1515ba15.564130
1616bb16.109281
1717aa 7.028036
1818ab21.508059
1919ba17.644257
2020bb16.054999
library(nortest)
ad.test(df[df$X1 == "a" & df$X2 == "a",]$Y) # condition a,a
ad.test(df[df$X1 == "a" & df$X2 == "b",]$Y) # condition a,b
ad.test(df[df$X1 == "b" & df$X2 == "a",]$Y) # condition b,a
ad.test(df[df$X1 == "b" & df$X2 == "b",]$Y) # condition b,b
	Anderson-Darling normality test

data:  df[df$X1 == "a" & df$X2 == "a", ]$Y
A = 0.23266, p-value = 0.7557
	Anderson-Darling normality test

data:  df[df$X1 == "a" & df$X2 == "b", ]$Y
A = 0.35901, p-value = 0.4023
	Anderson-Darling normality test

data:  df[df$X1 == "b" & df$X2 == "a", ]$Y
A = 0.16003, p-value = 0.934
	Anderson-Darling normality test

data:  df[df$X1 == "b" & df$X2 == "b", ]$Y
A = 1.4167, p-value = 0.0007191

Anderson-Darling Test (on residuals)#

  • Assumption: Normality

  • Context of Use: t-test, ANOVA, LM, LMM

  • Reporting: “To test the normality assumption, an Anderson-Darling test was run on the residuals of a between-subjects full-factorial ANOVA model. The test was statistically non-significant (A = 0.329, p = .510), indicating compliance with the normality assumption. A Q-Q plot of residuals visually confirms the same (Figure 1).”

# Example data
# df has two factors (X1,X2) each w/two levels (a,b) and continuous response Y
df <- read.csv("data/2F2LBs_normal.csv")
head(df, 20)
A data.frame: 20 × 4
SX1X2Y
<int><chr><chr><dbl>
1 1aa11.867340
2 2ab15.453745
3 3ba15.782415
4 4bb13.626554
5 5aa11.608195
6 6ab19.540674
7 7ba15.878829
8 8bb16.654611
9 9aa17.370245
1010ab17.547025
1111ba19.423109
1212bb16.369147
1313aa22.896627
1414ab14.037504
1515ba15.564130
1616bb16.109281
1717aa 7.028036
1818ab21.508059
1919ba17.644257
2020bb16.054999
library(nortest)
m = aov(Y ~ X1*X2, data=df) # make anova model
ad.test(residuals(m))
	Anderson-Darling normality test

data:  residuals(m)
A = 0.32859, p-value = 0.5102
par(mfrow=c(1,1))
qqnorm(residuals(m)); qqline(residuals(m)) # Q-Q plot
../../_images/anova-r_14_0.png

Levene’s Test#

  • Assumption: Homoscedasticity (Homogeneity of Variance)

  • Context of Use: t-test, ANOVA, LM, LMM

  • Reporting: “To test the homoscedasticity assumption, Levene’s test was run on a between-subjects full-factorial ANOVA model. The test was statistically significant (F(3, 56) = 3.97, p < .05), indicating a departure from homoscedasticity.”

# Example data
# df has two factors (X1,X2) each w/two levels (a,b) and continuous response Y
df <- read.csv("data/2F2LBs_normal.csv")
head(df, 20)
A data.frame: 20 × 4
SX1X2Y
<int><chr><chr><dbl>
1 1aa11.867340
2 2ab15.453745
3 3ba15.782415
4 4bb13.626554
5 5aa11.608195
6 6ab19.540674
7 7ba15.878829
8 8bb16.654611
9 9aa17.370245
1010ab17.547025
1111ba19.423109
1212bb16.369147
1313aa22.896627
1414ab14.037504
1515ba15.564130
1616bb16.109281
1717aa 7.028036
1818ab21.508059
1919ba17.644257
2020bb16.054999
library(car) # for leveneTest and Anova
leveneTest(Y ~ X1*X2, data=df, center=mean)
# if a violation occurs and only a t-test is needed, use a Welch t-test
t.test(Y ~ X1, data=df, var.equal=FALSE) # Welch t-test
# if a violation occurs and an ANOVA is needed, use a White-adjusted ANOVA
m = aov(Y ~ X1*X2, data=df)
Anova(m, type=3, white.adjust=TRUE)
Loading required package: carData
A anova: 2 × 3
DfF valuePr(>F)
<int><dbl><dbl>
group 33.9651240.01238959
56 NA NA
	Welch Two Sample t-test

data:  Y by X1
t = 0.80702, df = 43.125, p-value = 0.4241
alternative hypothesis: true difference in means between group a and group b is not equal to 0
95 percent confidence interval:
 -1.188936  2.775540
sample estimates:
mean in group a mean in group b 
       15.56348        14.77018 
Coefficient covariances computed by hccm()
A anova: 5 × 3
DfFPr(>F)
<dbl><dbl><dbl>
(Intercept) 1103.03498562.652463e-14
X1 1 0.41298625.230802e-01
X2 1 8.05918516.296776e-03
X1:X2 1 3.64413536.139637e-02
Residuals56 NA NA

Brown-Forsythe Test#

  • Assumption: Homoscedasticity (Homogeneity of Variance)

  • Context of Use: t-test, ANOVA, LM, LMM

  • Reporting: “To test the homoscedasticity assumption, the Brown-Forsythe test was run on a between-subjects full-factorial ANOVA model. The test was statistically significant (F(3, 56) = 3.75, p < .05), indicating a departure from homoscedasticity.”

# Example data
# df has two factors (X1,X2) each w/two levels (a,b) and continuous response Y
df <- read.csv("data/2F2LBs_normal.csv")
head(df, 20)
A data.frame: 20 × 4
SX1X2Y
<int><chr><chr><dbl>
1 1aa11.867340
2 2ab15.453745
3 3ba15.782415
4 4bb13.626554
5 5aa11.608195
6 6ab19.540674
7 7ba15.878829
8 8bb16.654611
9 9aa17.370245
1010ab17.547025
1111ba19.423109
1212bb16.369147
1313aa22.896627
1414ab14.037504
1515ba15.564130
1616bb16.109281
1717aa 7.028036
1818ab21.508059
1919ba17.644257
2020bb16.054999
library(car) # for leveneTest, Anova
leveneTest(Y ~ X1*X2, data=df, center=median)
# if a violation occurs and only a t-test is needed, use a Welch t-test
t.test(Y ~ X1, data=df, var.equal=FALSE) # Welch t-test
# if a violation occurs and an ANOVA is needed, use a White-adjusted ANOVA
m = aov(Y ~ X1*X2, data=df)
Anova(m, type=3, white.adjust=TRUE)
A anova: 2 × 3
DfF valuePr(>F)
<int><dbl><dbl>
group 33.7502520.01587369
56 NA NA
	Welch Two Sample t-test

data:  Y by X1
t = 0.80702, df = 43.125, p-value = 0.4241
alternative hypothesis: true difference in means between group a and group b is not equal to 0
95 percent confidence interval:
 -1.188936  2.775540
sample estimates:
mean in group a mean in group b 
       15.56348        14.77018 
Coefficient covariances computed by hccm()
A anova: 5 × 3
DfFPr(>F)
<dbl><dbl><dbl>
(Intercept) 1103.03498562.652463e-14
X1 1 0.41298625.230802e-01
X2 1 8.05918516.296776e-03
X1:X2 1 3.64413536.139637e-02
Residuals56 NA NA

Mauchly’s Test of Sphericity#

  • Assumption: Sphericity

  • Context of Use: repeated measures ANOVA

  • Reporting: “To test the sphericity assumption for repeated measures ANOVA, Mauchly’s test of sphericity was run on a mixed factorial ANOVA model with a between-subjects factor X1 and a within-subjects factor X2. The test was statistically significant for both X2 (W = .637, p < .01) and X1×X2 (W = .637, p < .01), indicating sphericity violations. Accordingly, the Greenhouse-Geisser correction was used when reporting these ANOVA results.”

# Example data
# df has subjects (S), one between-Ss factor (X1), and one within-Ss factor (X2)
df <- read.csv("data/2F23LMs_mauchly.csv")
head(df, 20)
A data.frame: 20 × 4
SX1X2Y
<int><chr><chr><dbl>
11aa20.2145
21ab23.7485
31ac20.7960
42aa20.8805
52ab23.2595
62ac19.1305
73aa21.2635
83ab23.4945
93ac20.8545
104aa20.7080
114ab23.9220
124ac18.2575
135aa21.0075
145ab23.4700
155ac17.7105
166aa19.9115
176ab24.2975
186ac19.8550
197aa22.1050
207ab25.3800
library(ez) # for ezANOVA
df <- read.csv("data/2F23LMs_mauchly.csv")
df$S = factor(df$S) # Subject id is nominal
m = ezANOVA(dv=Y, between=c(X1), within=c(X2), wid=S, type=3, data=df) # use c() for >1 factors
m$Mauchly # p<.05 indicates a sphericity violation for within-Ss effects
Warning message:
“Converting "X2" to factor for ANOVA.”
Warning message:
“Converting "X1" to factor for ANOVA.”
A data.frame: 2 × 4
EffectWpp<.05
<chr><dbl><dbl><chr>
3X2 0.63702360.008782794*
4X1:X20.63702360.008782794*