Houjun Liu

t-test

A t-test is a hypothesis test for statistical significance between two sample means based on t-statistics. Before it can be conducted, it must meet the conditions for inference.

conditions for inference (t-test)

To use t-statistics, you have to meet three conditions just like the conditions for inference used in z-score.

  • random sampling
  • normal (sample size larger than 30, or if original distribution is confirmed as roughly symmetric about the mean)
  • Independence

use a z-statistic to find a p-value

Begin by finding a \(t\) statistic. Remember that:

\begin{equation} t = \frac{statistic-parameter}{std\ err} \end{equation}

In this case, when we are dealing with sample means, then, we have:

\begin{equation} t = \frac{\bar{x}-\mu_0}{\frac{S_x}{\sqrt{n}}} \end{equation}

where \(\bar{x}\) is the measured mean, \(\mu_0\) is the null hypothesis mean, and \(S_x\) the sample’s sample standard deviation.

Quick note:

\(SE = \frac{S}{\sqrt{n}}\) because the central limit theorem states that sample means for their own distribution, whose variance equals the original variance divided by the sample size. Hence, the standard deviation of the means would be the sample standard deviation divided by the square root of the sample size.

Once you have a \(t\) value, you look at the test and what its asking (above the mean? below the mean? etc.) and add up the tail probabilities.

paired vs two-sample tests

A paired t-test looks at pairs of values as statistic in itself (i.e. substracts directly, etc.) Think about it as a compound statistic, so you are doing a \(t\) test on one value, it just happened to be composed/calculated by a pair of values. (for instance, “difference between mother-father glucose levels.”)

A two-staple t-test looks at two independent events and compares them. Hence, they are two random variables and should be manipulated as such.

t-tests for regression lines

regression lines can be imbibed with predictive power and confidence intervals:

\begin{equation} m \pm t^* SE_b \end{equation}

where \(m\) is the slope and \(SE_b\) is the standard error of the regression line.

Note that the degrees of freedom used for \(t^*\) is the number of data points, minus two.

conditions for inference (slops)

Acronym: LINEAR

  • Linear
  • Independent (observations are independent or \(<10\%\))
  • Normal (for a given \(x\), \(y\) is normally distributed)
  • Equal variance (for any given \(x\), it should have a roughly equal standard deviation in \(y\))
  • Random