Two Sample Tests of Proportion - Python#

Fisher’s Exact Test#

Samples: 2
Response Categories: 2
Exact?: Yes, use with N≤200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. Fisher’s exact test indicated a statistically significant association between X and Y (p < .0001).”

Warning

The isn’t a readily available and non-GPL licensed Fisher’s Exact Test that works on more than 2 response categories like it does in R. It is recommended to try a G-Test or Two-Sample Pearson Chi-Squared Test instead if working in Python.

Related GitHub Issue: https://github.com/scipy/scipy/issues/7099

The following example will be only for 2 samples and 2 response categories.

import pandas as pd

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")

# Filter the data for the sake of making it work for the example
df = df.loc[df["Y"].isin(["x", "y"])]
df.head(20)

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
10	11	a	y
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
20	21	a	y
21	22	b	y
23	24	b	x

xt = pd.crosstab(df["X"], df["Y"])
xt

Y	x	y
X
a	3	26
b	14	9

from scipy.stats import fisher_exact

_, p = fisher_exact(xt)
p

0.00021895317390182282

G-Test#

Samples: 2
Response Categories: ≥2
Exact?: No, use with N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A G-test indicated a statistically significant association between X and Y (G(2) = 21.40, p < .0001).”

import pandas as pd

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z

xt = pd.crosstab(df["X"], df["Y"])
xt

Y	x	y	z
X
a	3	26	1
b	14	9	7

from scipy.stats import chi2_contingency

g_stat, p, dof, exp_freq = chi2_contingency(xt, lambda_="log-likelihood")
g_stat, p, dof

(21.402062415325055, 2.252170138338781e-05, 2)

Two-Sample Pearson Chi-Squared Test#

Samples: 2
Response Categories: ≥2
Exact?: No, use with N>200
Reporting: “Table 1 shows the counts of the ‘x’, ‘y’, and ‘z’ outcomes for each of ‘a’ and ‘b’. A two-sample Pearson Chi-Squared test indicated a statistically significant association between X and Y (χ2 (2, N=60) = 19.88, p < .0001).”

import pandas as pd

# Example data
# df is a long-format data table w/subject (S), categorical factor (X) and outcome (Y)
df = pd.read_csv("data/1F2LBs_multinomial.csv")
df.head(20)

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z

xt = pd.crosstab(df["X"], df["Y"])
xt

Y	x	y	z
X
a	3	26	1
b	14	9	7

from scipy.stats import chi2_contingency

g_stat, p, dof, exp_freq = chi2_contingency(xt)
g_stat, p, dof

(19.87478991596639, 4.8333050401877814e-05, 2)

Statistical Analysis and Reporting

Two Sample Tests of Proportion - Python

Contents

Two Sample Tests of Proportion - Python#

Fisher’s Exact Test#

G-Test#

Two-Sample Pearson Chi-Squared Test#

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
10	11	a	y
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
20	21	a	y
21	22	b	y
23	24	b	x

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
10	11	a	y
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
20	21	a	y
21	22	b	y
23	24	b	x

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
10	11	a	y
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
20	21	a	y
21	22	b	y
23	24	b	x

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z

	S	X	Y
0	1	a	y
1	2	b	x
2	3	a	x
3	4	b	y
4	5	a	y
5	6	b	x
6	7	a	y
7	8	b	x
8	9	a	y
9	10	b	z
10	11	a	y
11	12	b	z
12	13	a	y
13	14	b	y
14	15	a	y
15	16	b	y
16	17	a	y
17	18	b	x
18	19	a	y
19	20	b	z