Two Sample Tests of Proportion - R#

Fisher’s Exact Test#

  • Samples: 2

  • Response Categories: ≥2

  • Exact?: Yes, use with N≤200

  • Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. Fisher’s exact test indicated a statistically significant association between X and Y (p < .0001).”

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)
A data.frame: 20 × 3
SXY
<int><chr><chr>
1 1ay
2 2bx
3 3ax
4 4by
5 5ay
6 6bx
7 7ay
8 8bx
9 9ay
1010bz
1111ay
1212bz
1313ay
1414by
1515ay
1616by
1717ay
1818bx
1919ay
2020bz
df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt
   Y
X    x  y  z
  a  3 26  1
  b 14  9  7
fisher.test(xt) # Run test
	Fisher's Exact Test for Count Data

data:  xt
p-value = 2.432e-05
alternative hypothesis: two.sided

G-Test#

  • Samples: 2

  • Response Categories: ≥2

  • Exact?: No, use with N>200

  • Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A G-test indicated a statistically significant association between X and Y (G(2) = 21.40, p < .0001).”

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)
A data.frame: 20 × 3
SXY
<int><chr><chr>
1 1ay
2 2bx
3 3ax
4 4by
5 5ay
6 6bx
7 7ay
8 8bx
9 9ay
1010bz
1111ay
1212bz
1313ay
1414by
1515ay
1616by
1717ay
1818bx
1919ay
2020bz
# This can also be shortened using the RVAideMemoire library
# install.packages("RVAideMemoire")
# on Ubuntu you may have trouble installing, see: TODO
library(RVAideMemoire)

df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt
*** Package RVAideMemoire v 0.9-81-2 ***
   Y
X    x  y  z
  a  3 26  1
  b 14  9  7
G.test(xt)
	G-test

data:  xt
G = 21.402, df = 2, p-value = 2.252e-05

Two-Sample Pearson Chi-Squared Test#

  • Samples: 2

  • Response Categories: ≥2

  • Exact?: No, use with N>200

  • Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A two-sample Pearson Chi-Squared test indicated a statistically significant association between X and Y (χ2(2, N=60) = 19.88, p < .0001).”

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)
A data.frame: 20 × 3
SXY
<int><chr><chr>
1 1ay
2 2bx
3 3ax
4 4by
5 5ay
6 6bx
7 7ay
8 8bx
9 9ay
1010bz
1111ay
1212bz
1313ay
1414by
1515ay
1616by
1717ay
1818bx
1919ay
2020bz
df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt
   Y
X    x  y  z
  a  3 26  1
  b 14  9  7
chisq.test(xt)
Warning message in chisq.test(xt):
“Chi-squared approximation may be incorrect”
	Pearson's Chi-squared test

data:  xt
X-squared = 19.875, df = 2, p-value = 4.833e-05