Two Sample Tests of Proportion - R
Contents
Two Sample Tests of Proportion - R#
Fisher’s Exact Test#
Samples:
2
Response Categories:
≥2
Exact?: Yes, use with
N≤200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. Fisher’s exact test indicated a statistically significant association between X and Y (p < .0001).”
# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)
S | X | Y | |
---|---|---|---|
<int> | <chr> | <chr> | |
1 | 1 | a | y |
2 | 2 | b | x |
3 | 3 | a | x |
4 | 4 | b | y |
5 | 5 | a | y |
6 | 6 | b | x |
7 | 7 | a | y |
8 | 8 | b | x |
9 | 9 | a | y |
10 | 10 | b | z |
11 | 11 | a | y |
12 | 12 | b | z |
13 | 13 | a | y |
14 | 14 | b | y |
15 | 15 | a | y |
16 | 16 | b | y |
17 | 17 | a | y |
18 | 18 | b | x |
19 | 19 | a | y |
20 | 20 | b | z |
df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt
Y
X x y z
a 3 26 1
b 14 9 7
fisher.test(xt) # Run test
Fisher's Exact Test for Count Data
data: xt
p-value = 2.432e-05
alternative hypothesis: two.sided
G-Test#
Samples:
2
Response Categories:
≥2
Exact?: No, use with
N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A G-test indicated a statistically significant association between X and Y (G(2) = 21.40, p < .0001).”
# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)
S | X | Y | |
---|---|---|---|
<int> | <chr> | <chr> | |
1 | 1 | a | y |
2 | 2 | b | x |
3 | 3 | a | x |
4 | 4 | b | y |
5 | 5 | a | y |
6 | 6 | b | x |
7 | 7 | a | y |
8 | 8 | b | x |
9 | 9 | a | y |
10 | 10 | b | z |
11 | 11 | a | y |
12 | 12 | b | z |
13 | 13 | a | y |
14 | 14 | b | y |
15 | 15 | a | y |
16 | 16 | b | y |
17 | 17 | a | y |
18 | 18 | b | x |
19 | 19 | a | y |
20 | 20 | b | z |
# This can also be shortened using the RVAideMemoire library
# install.packages("RVAideMemoire")
# on Ubuntu you may have trouble installing, see: TODO
library(RVAideMemoire)
df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt
*** Package RVAideMemoire v 0.9-81-2 ***
Y
X x y z
a 3 26 1
b 14 9 7
G.test(xt)
G-test
data: xt
G = 21.402, df = 2, p-value = 2.252e-05
Two-Sample Pearson Chi-Squared Test#
Samples:
2
Response Categories:
≥2
Exact?: No, use with
N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A two-sample Pearson Chi-Squared test indicated a statistically significant association between X and Y (χ2(2, N=60) = 19.88, p < .0001).”
# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df <- read.csv("data/1F2LBs_multinomial.csv")
head(df, 20)
S | X | Y | |
---|---|---|---|
<int> | <chr> | <chr> | |
1 | 1 | a | y |
2 | 2 | b | x |
3 | 3 | a | x |
4 | 4 | b | y |
5 | 5 | a | y |
6 | 6 | b | x |
7 | 7 | a | y |
8 | 8 | b | x |
9 | 9 | a | y |
10 | 10 | b | z |
11 | 11 | a | y |
12 | 12 | b | z |
13 | 13 | a | y |
14 | 14 | b | y |
15 | 15 | a | y |
16 | 16 | b | y |
17 | 17 | a | y |
18 | 18 | b | x |
19 | 19 | a | y |
20 | 20 | b | z |
df$S = factor(df$S) # Subject id is nominal (unused)
df$X = factor(df$X) # X is a factor of m ≥ 2 levels
df$Y = factor(df$Y) # Y is an outcome of n ≥ 2 categories
xt = xtabs( ~ X + Y, data=df) # make m × n crosstabs
xt
Y
X x y z
a 3 26 1
b 14 9 7
chisq.test(xt)
Warning message in chisq.test(xt):
“Chi-squared approximation may be incorrect”
Pearson's Chi-squared test
data: xt
X-squared = 19.875, df = 2, p-value = 4.833e-05