Statistical Analysis and Reporting#

A Jupyter Book to help you find and run statistical tests in both R and Python.

Using your data, find the test you want to run (proportion, assumption, distribution, effect, etc.) and which language you want to run this test in. The code snippet provided with each test is just an example, Code recipes only get one so far. Please do not blindly throw code snippets at every problem you encounter. In reality, there’s sometimes more to be done.

Tests of Proportion Index#

Samples

Response Categories

N

Test in R

Test in Python

1

2

≤200

Binomial Test

Binomial Test

1

≥2

≤200

Multinomial Test

Multinomial Test

1

≥2

>200

One-Sample Pearson Chi-Squared Test

One-Sample Pearson Chi-Squared Test

2

≥2

≤200

Fisher’s Exact Test

Fisher’s Exact Test

2

≥2

>200

G-Test

G-Test

2

≥2

>200

Two-Sample Pearson Chi-Squared Test

Two-Sample Pearson Chi-Squared Test

Tests of Assumption Index#

Assumption

Context of Use

Test in R

Test in Python

Normality

t-test, ANOVA, LM, LMM

Shapiro-Wilk Test (on responses)

Shapiro-Wilk Test (on responses)

Normality

t-test, ANOVA, LM, LMM

Shapiro-Wilk Test (on residuals)

Shapiro-Wilk Test (on residuals)

Normality

t-test, ANOVA, LM, LMM

Anderson-Darling Test (on responses)

Anderson-Darling Test (on responses)

Normality

t-test, ANOVA, LM, LMM

Anderson-Darling Test (on residuals)

Anderson-Darling Test (on residuals)

Homoscedasticity (Homogeneity of Variance)

t-test, ANOVA, LM, LMM

Levene’s Test

Levene’s Test

Homoscedasticity (Homogeneity of Variance)

t-test, ANOVA, LM, LMM

Brown-Forsythe Test

Brown-Forsythe Test

Sphericity

Repeated Measures ANOVA

Mauchly’s Test of Sphericity

Mauchly’s Test of Sphericity

Tests of Distributions Index#

Distribution

Parameterization

Test in R

Test in Python

Normal

mean (μ): mean, standard deviation (σ): sd

KS Test for Normal Distribution

KS Test for Normal Distribution

Lognormal

mean (μ): meanlog, standard deviation (σ): sdlog

KS Test for Lognormal Distribution

KS Test for Lognormal Distribution

Poisson

lambda (λ): lambda

KS Test for Poisson Distribution

KS Test for Poisson Distribution

Negative Binomial

theta (θ): theta, mu (μ): mu

KS Test for Negative Binomial Distribution

KS Test for Negative Binomial Distribution

Exponential

rate (λ): rate

KS Test for Exponential Distribution

KS Test for Exponential Distribution

Gamma

shape (α): shape, rate (β): rate

KS Test for Gamma Distribution

KS Test for Gamma Distribution

Parametric Tests of Effect Index#

Samples

Levels

Between or Within Subjects

Test in R

Test in Python

1

2

Between

Independent-Samples t-test

Independent-Samples t-test

1

2

Within

Paired-Samples t-test

Paired-Samples t-test

1

≥2

Between

One-Way ANOVA

One-Way ANOVA

1

≥2

Within

One-Way Repeated Measures ANOVA

One-Way Repeated Measures ANOVA

≥2

≥2

Between

Factorial ANOVA

Factorial ANOVA

≥2

≥2

Between

Linear Model (LM)

Linear Model (LM)

≥2

≥2

Within

Factorial Repeated Measures ANOVA

Factorial Repeated Measures ANOVA

≥2

≥2

Within

Linear Mixed Model (LMM)

Linear Mixed Model (LMM)

Nonparametric Tests of Effect Index#

Samples

Levels

Between or Within Subjects

Test in R

Test in Python

1

2

Between

Mann-Whitney U test

Mann-Whitney U test

1

2

Within

Wilcoxon Signed-Rank Test

Wilcoxon Signed-Rank Test

1

≥2

Between

Kruskal-Wallis Test

Kruskal-Wallis Test

1

≥2

Within

Friedman Test

Friedman Test

≥2

≥2

Between

Aligned Rank Transform for Between Subjects (ART)

Aligned Rank Transform for Between Subjects (ART)

≥2

≥2

Between

Generalized Linear Model (GLM)

Generalized Linear Model (GLM)

≥2

≥2

Within

Aligned Rank Transform for Within Subjects (ART)

Aligned Rank Transform for Within Subjects (ART)

≥2

≥2

Within

Generalized Linear Mixed Model (GLMM)

Generalized Linear Mixed Model (GLMM)

Generalized Linear (Mixed) Models Index#

Distribution

Typical Uses

Between or Within Subjects

Test in R

Test in Python

Normal

Linear Regression: equivalent to linear (mixed) model (lm / lmm)

Between

GLM for Normal Distribution Data

GLM for Normal Distribution Data

Normal

Linear Regression: equivalent to linear (mixed) model (lm / lmm)

Within

GLM for Normal Distribution Data

GLM for Normal Distribution Data

Binomial

Logistic Regression: dichotomous responses (e.g. nominal responses with two categories)

Between

GLM for Binomial Distribution Data

GLM for Binomial Distribution Data

Binomial

Logistic Regression: dichotomous responses (e.g. nominal responses with two categories)

Within

GLM for Binomial Distribution Data

GLM for Binomial Distribution Data

Multinomial

Multinomial Logistic Regression: polytomous responses (i.e. nominal responses with more two categories)

Between

GLM for Multinomial Distribution Data

GLM for Multinomial Distribution Data

Multinomial

Multinomial Logistic Regression: polytomous responses (i.e. nominal responses with more two categories)

Within

GLM for Multinomial Distribution Data

GLM for Multinomial Distribution Data

Ordinal

Ordinal Logistic Regression: ordinal responses (i.e. Likert scales)

Between

GLM for Ordinal Distribution Data

GLM for Ordinal Distribution Data

Ordinal

Ordinal Logistic Regression: ordinal responses (i.e. Likert scales)

Within

GLM for Ordinal Distribution Data

GLM for Ordinal Distribution Data

Poisson

Poisson Regression: counts, rare events (e.g. gesture recognition errors, 3-pointers per quarter, number of “F” grades)

Between

GLM for Poisson Distribution Data

GLM for Poisson Distribution Data

Poisson

Poisson Regression: counts, rare events (e.g. gesture recognition errors, 3-pointers per quarter, number of “F” grades)

Within

GLM for Poisson Distribution Data

GLM for Poisson Distribution Data

Zero-Inflated Poisson

Zero-Inflated Poisson Regression: counts, rare events with a large portion of zeroes

Between

GLM for Zero-Inflated Poisson Distribution Data

GLM for Zero-Inflated Poisson Distribution Data

Zero-Inflated Poisson

Zero-Inflated Poisson Regression: counts, rare events with a large portion of zeroes

Within

GLM for Zero-Inflated Poisson Distribution Data

GLM for Zero-Inflated Poisson Distribution Data

Negative Binomial

Negative Binomial Regression: same as Poisson but for use in the presence of overdispersion

Between

GLM for Negative Binomial Distribution Data

GLM for Negative Binomial Distribution Data

Negative Binomial

Negative Binomial Regression: same as Poisson but for use in the presence of overdispersion

Within

GLM for Negative Binomial Distribution Data

GLM for Negative Binomial Distribution Data

Zero-Inflated Negative Binomial

Zero-Inflated Negative Binomial Regression: same as Zero-Inflated Poisson but for use in the presence of overdispersion

Between

GLM for Zero-Inflated Negative Binomial Distribution Data

GLM for Zero-Inflated Negative Binomial Distribution Data

Zero-Inflated Negative Binomial

Zero-Inflated Negative Binomial Regression: same as Zero-Inflated Poisson but for use in the presence of overdispersion

Within

GLM for Zero-Inflated Negative Binomial Distribution Data

GLM for Zero-Inflated Negative Binomial Distribution Data

Gamma and Exponential

Gamme and Exponential Regression: exponentially distributed responses (e.g. income, wait times)

Between

GLM for Gamme and Exponential Distribution Data

GLM for Gamma and Exponential Distribution Data

Gamma and Exponential

Gamme and Exponential Regression: exponentially distributed responses (e.g. income, wait times)

Within

GLM for Gamme and Exponential Distribution Data

GLM for Gamma and Exponential Distribution Data