Two Sample Tests of Proportion - Python#

Fisher’s Exact Test#

  • Samples: 2

  • Response Categories: 2

  • Exact?: Yes, use with N≤200

  • Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. Fisher’s exact test indicated a statistically significant association between X and Y (p < .0001).”

Warning

The isn’t a readily available and non-GPL licensed Fisher’s Exact Test that works on more than 2 response categories like it does in R. It is recommended to try a G-Test or Two-Sample Pearson Chi-Squared Test instead if working in Python.

Related GitHub Issue: https://github.com/scipy/scipy/issues/7099

The following example will be only for 2 samples and 2 response categories.

import pandas as pd

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")

# Filter the data for the sake of making it work for the example
df = df.loc[df["Y"].isin(["x", "y"])]
df.head(20)
S X Y
0 1 a y
1 2 b x
2 3 a x
3 4 b y
4 5 a y
5 6 b x
6 7 a y
7 8 b x
8 9 a y
10 11 a y
12 13 a y
13 14 b y
14 15 a y
15 16 b y
16 17 a y
17 18 b x
18 19 a y
20 21 a y
21 22 b y
23 24 b x
xt = pd.crosstab(df["X"], df["Y"])
xt
Y x y
X
a 3 26
b 14 9
from scipy.stats import fisher_exact

_, p = fisher_exact(xt)
p
0.00021895317390182282

G-Test#

  • Samples: 2

  • Response Categories: ≥2

  • Exact?: No, use with N>200

  • Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A G-test indicated a statistically significant association between X and Y (G(2) = 21.40, p < .0001).”

import pandas as pd

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)
S X Y
0 1 a y
1 2 b x
2 3 a x
3 4 b y
4 5 a y
5 6 b x
6 7 a y
7 8 b x
8 9 a y
9 10 b z
10 11 a y
11 12 b z
12 13 a y
13 14 b y
14 15 a y
15 16 b y
16 17 a y
17 18 b x
18 19 a y
19 20 b z
xt = pd.crosstab(df["X"], df["Y"])
xt
Y x y z
X
a 3 26 1
b 14 9 7
from scipy.stats import chi2_contingency

g_stat, p, dof, exp_freq = chi2_contingency(xt, lambda_="log-likelihood")
g_stat, p, dof
(21.402062415325055, 2.252170138338781e-05, 2)

Two-Sample Pearson Chi-Squared Test#

  • Samples: 2

  • Response Categories: ≥2

  • Exact?: No, use with N>200

  • Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A two-sample Pearson Chi-Squared test indicated a statistically significant association between X and Y (χ2 (2, N=60) = 19.88, p < .0001).”

import pandas as pd

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)
S X Y
0 1 a y
1 2 b x
2 3 a x
3 4 b y
4 5 a y
5 6 b x
6 7 a y
7 8 b x
8 9 a y
9 10 b z
10 11 a y
11 12 b z
12 13 a y
13 14 b y
14 15 a y
15 16 b y
16 17 a y
17 18 b x
18 19 a y
19 20 b z
xt = pd.crosstab(df["X"], df["Y"])
xt
Y x y z
X
a 3 26 1
b 14 9 7
from scipy.stats import chi2_contingency

g_stat, p, dof, exp_freq = chi2_contingency(xt)
g_stat, p, dof
(19.87478991596639, 4.8333050401877814e-05, 2)