Houjun Liu

chi-square

\(\chi^2\) is a test statistic for hypothesis testing.

motivation for chi-square

The motivation for chi-square is because t-test (means, “is the value significantly different”) and z-test (proportion, “is the incidence percentage significantly different”) all don’t really cover categorical data samples: “the categories are distributed in this way.”

Take, for instance, if we want to test the following null hypothesis:

CategoryExpectedActual
A2520
B2520
C2525
D2525

\(\alpha = 0.05\). What do we use to test this??

(hint: we can’t, unless…)

Enter chi-square.

chi-square test

chi-square test is a hypothesis test for categorical data. It is responsible to translate differences in distributions into p-values for significance.

Begin by calculating chi-square after you confirmed that your experiment meets conditions for inference (chi-square test).

Once you have that, look it up at a chi-square table to figure the appropriate p-value. Then, proceed with normal hypothesis testing.

Because of this categorical nature, chi-square test can also be used as a homogeneity test.

conditions for inference (chi-square test)

  • random sampling
  • expected value for data must be \(\geq 5\)
  • sampling should be \(<10\%\) or independent

chi-square test for homogeneity

The chi-square test for homogeneity is a test for homogeneity via the chi-square statistic.

To do this, we take the probability of a certain outcome happening—if distributed equally—and apply it to the samples to compare.

Take, for instance:

SubjectRight HandLeft HandTotal
STEM301040
Humanities152540
Equal15520
Total6040100

We will then figure the expected outcomes:

RightLeft
2416
2416
128

Awesome! Now, calculate chi-square with each cell of measured outcomes. Calculate degrees of freedom by (num_row-1)*(num_col-1).

chi-square test for independence

The chi-square test for independence is a test designed to accept-reject the null hypothesis of “no association between two variables.”

Essentially, you leverage the fact that “AND” relationships are multiplicative probabilities. Therefore, the expected outcomes are simply the multiplied/fraction of sums:

calculating chi-square

\begin{equation} \chi^2 = \frac{(\hat{x}_0-x_0)^2}{x_0} +\frac{(\hat{x}_1-x_1)^2}{x_1} + \cdots + \frac{(\hat{x}_n-x_n)^2}{x_n} \end{equation}

Where, \(\hat{x}_i\) is the measured value and \(x_i\) is the expected value.