Two Sample Tests of Proportion - R#

Fisher’s Exact Test#

Samples: 2
Response Categories: ≥2
Exact?: Yes, use with N≤200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. Fisher’s exact test indicated a statistically significant association between X and Y (p < .0001).”

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)

A data.frame: 20 × 3
	S	X	Y
	<int>	<chr>	<chr>
1	1	a	y
2	2	b	x
3	3	a	x
4	4	b	y
5	5	a	y
6	6	b	x
7	7	a	y
8	8	b	x
9	9	a	y
10	10	b	z
11	11	a	y
12	12	b	z
13	13	a	y
14	14	b	y
15	15	a	y
16	16	b	y
17	17	a	y
18	18	b	x
19	19	a	y
20	20	b	z

df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt

   Y
X    x  y  z
  a  3 26  1
  b 14  9  7

fisher.test(xt) # Run test

	Fisher's Exact Test for Count Data

data:  xt
p-value = 2.432e-05
alternative hypothesis: two.sided

G-Test#

Samples: 2
Response Categories: ≥2
Exact?: No, use with N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A G-test indicated a statistically significant association between X and Y (G(2) = 21.40, p < .0001).”

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)

A data.frame: 20 × 3
	S	X	Y
	<int>	<chr>	<chr>
1	1	a	y
2	2	b	x
3	3	a	x
4	4	b	y
5	5	a	y
6	6	b	x
7	7	a	y
8	8	b	x
9	9	a	y
10	10	b	z
11	11	a	y
12	12	b	z
13	13	a	y
14	14	b	y
15	15	a	y
16	16	b	y
17	17	a	y
18	18	b	x
19	19	a	y
20	20	b	z

# This can also be shortened using the RVAideMemoire library
# install.packages("RVAideMemoire")
# on Ubuntu you may have trouble installing, see: TODO
library(RVAideMemoire)

df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt

*** Package RVAideMemoire v 0.9-81-2 ***

   Y
X    x  y  z
  a  3 26  1
  b 14  9  7

G.test(xt)

	G-test

data:  xt
G = 21.402, df = 2, p-value = 2.252e-05

Two-Sample Pearson Chi-Squared Test#

Samples: 2
Response Categories: ≥2
Exact?: No, use with N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A two-sample Pearson Chi-Squared test indicated a statistically significant association between X and Y (χ2(2, N=60) = 19.88, p < .0001).”

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)

A data.frame: 20 × 3
	S	X	Y
	<int>	<chr>	<chr>
1	1	a	y
2	2	b	x
3	3	a	x
4	4	b	y
5	5	a	y
6	6	b	x
7	7	a	y
8	8	b	x
9	9	a	y
10	10	b	z
11	11	a	y
12	12	b	z
13	13	a	y
14	14	b	y
15	15	a	y
16	16	b	y
17	17	a	y
18	18	b	x
19	19	a	y
20	20	b	z

df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt

   Y
X    x  y  z
  a  3 26  1
  b 14  9  7

chisq.test(xt)

Warning message in chisq.test(xt):
“Chi-squared approximation may be incorrect”

	Pearson's Chi-squared test

data:  xt
X-squared = 19.875, df = 2, p-value = 4.833e-05

Statistical Analysis and Reporting

Two Sample Tests of Proportion - R

Contents

Two Sample Tests of Proportion - R#

Fisher’s Exact Test#

G-Test#

Two-Sample Pearson Chi-Squared Test#