Two Sample Tests of Proportion - Python
Contents
Two Sample Tests of Proportion - Python#
Fisher’s Exact Test#
Samples:
2
Response Categories:
2
Exact?: Yes, use with
N≤200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. Fisher’s exact test indicated a statistically significant association between X and Y (p < .0001).”
Warning
The isn’t a readily available and non-GPL licensed Fisher’s Exact Test that works on more than 2 response categories like it does in R. It is recommended to try a G-Test or Two-Sample Pearson Chi-Squared Test instead if working in Python.
Related GitHub Issue: https://github.com/scipy/scipy/issues/7099
The following example will be only for 2 samples and 2 response categories.
import pandas as pd
# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")
# Filter the data for the sake of making it work for the example
df = df.loc[df["Y"].isin(["x", "y"])]
df.head(20)
S | X | Y | |
---|---|---|---|
0 | 1 | a | y |
1 | 2 | b | x |
2 | 3 | a | x |
3 | 4 | b | y |
4 | 5 | a | y |
5 | 6 | b | x |
6 | 7 | a | y |
7 | 8 | b | x |
8 | 9 | a | y |
10 | 11 | a | y |
12 | 13 | a | y |
13 | 14 | b | y |
14 | 15 | a | y |
15 | 16 | b | y |
16 | 17 | a | y |
17 | 18 | b | x |
18 | 19 | a | y |
20 | 21 | a | y |
21 | 22 | b | y |
23 | 24 | b | x |
xt = pd.crosstab(df["X"], df["Y"])
xt
Y | x | y |
---|---|---|
X | ||
a | 3 | 26 |
b | 14 | 9 |
from scipy.stats import fisher_exact
_, p = fisher_exact(xt)
p
0.00021895317390182282
G-Test#
Samples:
2
Response Categories:
≥2
Exact?: No, use with
N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A G-test indicated a statistically significant association between X and Y (G(2) = 21.40, p < .0001).”
import pandas as pd
# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)
S | X | Y | |
---|---|---|---|
0 | 1 | a | y |
1 | 2 | b | x |
2 | 3 | a | x |
3 | 4 | b | y |
4 | 5 | a | y |
5 | 6 | b | x |
6 | 7 | a | y |
7 | 8 | b | x |
8 | 9 | a | y |
9 | 10 | b | z |
10 | 11 | a | y |
11 | 12 | b | z |
12 | 13 | a | y |
13 | 14 | b | y |
14 | 15 | a | y |
15 | 16 | b | y |
16 | 17 | a | y |
17 | 18 | b | x |
18 | 19 | a | y |
19 | 20 | b | z |
xt = pd.crosstab(df["X"], df["Y"])
xt
Y | x | y | z |
---|---|---|---|
X | |||
a | 3 | 26 | 1 |
b | 14 | 9 | 7 |
from scipy.stats import chi2_contingency
g_stat, p, dof, exp_freq = chi2_contingency(xt, lambda_="log-likelihood")
g_stat, p, dof
(21.402062415325055, 2.252170138338781e-05, 2)
Two-Sample Pearson Chi-Squared Test#
Samples:
2
Response Categories:
≥2
Exact?: No, use with
N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A two-sample Pearson Chi-Squared test indicated a statistically significant association between X and Y (χ2 (2, N=60) = 19.88, p < .0001).”
import pandas as pd
# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)
S | X | Y | |
---|---|---|---|
0 | 1 | a | y |
1 | 2 | b | x |
2 | 3 | a | x |
3 | 4 | b | y |
4 | 5 | a | y |
5 | 6 | b | x |
6 | 7 | a | y |
7 | 8 | b | x |
8 | 9 | a | y |
9 | 10 | b | z |
10 | 11 | a | y |
11 | 12 | b | z |
12 | 13 | a | y |
13 | 14 | b | y |
14 | 15 | a | y |
15 | 16 | b | y |
16 | 17 | a | y |
17 | 18 | b | x |
18 | 19 | a | y |
19 | 20 | b | z |
xt = pd.crosstab(df["X"], df["Y"])
xt
Y | x | y | z |
---|---|---|---|
X | |||
a | 3 | 26 | 1 |
b | 14 | 9 | 7 |
from scipy.stats import chi2_contingency
g_stat, p, dof, exp_freq = chi2_contingency(xt)
g_stat, p, dof
(19.87478991596639, 4.8333050401877814e-05, 2)