\(\chi^2\) is a test statistic for hypothesis testing.

## motivation for chi-square

The motivation for chi-square is because t-test (means, “is the value significantly different”) and z-test (proportion, “is the incidence percentage significantly different”) all don’t really cover categorical data samples: “the categories are distributed in this way.”

Take, for instance, if we want to test the following null hypothesis:

Category | Expected | Actual |
---|---|---|

A | 25 | 20 |

B | 25 | 20 |

C | 25 | 25 |

D | 25 | 25 |

\(\alpha = 0.05\). What do we use to test this??

(hint: we can’t, unless…)

Enter chi-square.

## chi-square test

chi-square test is a hypothesis test for categorical data. It is responsible to translate differences in distributions into p-values for significance.

Begin by calculating chi-square after you confirmed that your experiment meets conditions for inference (chi-square test).

Once you have that, look it up at a chi-square table to figure the appropriate p-value. Then, proceed with normal hypothesis testing.

Because of this categorical nature, chi-square test can also be used as a homogeneity test.

## conditions for inference (chi-square test)

- random sampling
- expected value for data must be \(\geq 5\)
- sampling should be \(<10\%\) or independent

## chi-square test for homogeneity

The chi-square test for homogeneity is a test for homogeneity via the chi-square statistic.

To do this, we take the probability of a certain outcome happening—if distributed equally—and apply it to the samples to compare.

Take, for instance:

Subject | Right Hand | Left Hand | Total |
---|---|---|---|

STEM | 30 | 10 | 40 |

Humanities | 15 | 25 | 40 |

Equal | 15 | 5 | 20 |

Total | 60 | 40 | 100 |

We will then figure the expected outcomes:

Right | Left |
---|---|

24 | 16 |

24 | 16 |

12 | 8 |

Awesome! Now, calculate chi-square with each cell of measured outcomes. Calculate degrees of freedom by (num_row-1)*(num_col-1).

## chi-square test for independence

The chi-square test for independence is a test designed to accept-reject the null hypothesis of “no association between two variables.”

Essentially, you leverage the fact that “AND” relationships are multiplicative probabilities. Therefore, the expected outcomes are simply the multiplied/fraction of sums:

## calculating chi-square

\begin{equation} \chi^2 = \frac{(\hat{x}_0-x_0)^2}{x_0} +\frac{(\hat{x}_1-x_1)^2}{x_1} + \cdots + \frac{(\hat{x}_n-x_n)^2}{x_n} \end{equation}

Where, \(\hat{x}_i\) is the measured value and \(x_i\) is the expected value.