Hypothesis testing is a formal procedure used in statistics to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. It follows a logic similar to a legal trial: the “null hypothesis” (innocence) is assumed true until “sufficient evidence” (the sample data) proves otherwise.
Every statistical test involves two competing hypotheses:
| Test Type | Null Hypothesis ($H_0$) | Alternative Hypothesis ($H_1$) |
|---|---|---|
| Right-tailed | $H_0: \mu = \mu_0$ | $H_1: \mu > \mu_0$ |
| Left-tailed | $H_0: \mu = \mu_0$ | $H_1: \mu < \mu_0$ |
| Two-tailed | $H_0: \mu = \mu_0$ | $H_1: \mu \neq \mu_0$ |
COMMON MISTAKE: Students often use the sample mean $\bar{x}$ in their hypotheses. Remember, hypotheses are always statements about the population parameter ($\mu$), never the sample statistic.
To test the hypothesis about a population mean $\mu$ where the population standard deviation $\sigma$ is known, we calculate a test statistic. This value measures how many standard errors the observed sample mean $\bar{x}$ is away from the hypothesised mean $\mu_0$.
For a sample size $n$, the test statistic $Z$ is:
$$Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$$
This formula assumes either the population is normally distributed or $n$ is large enough ($n \ge 30$) for the Central Limit Theorem to apply.
VCAA FOCUS: Ensure you check the conditions for a $z$-test. You must know the population standard deviation $\sigma$. If $\sigma$ is unknown and the sample is large, Specialist Mathematics students typically use the sample standard deviation $s$ as an estimate for $\sigma$.
The $p$-value is the probability of observing a sample statistic as extreme as, or more extreme than, the one observed, assuming that the null hypothesis is true.
Let $Z_{calc}$ be the calculated test statistic from the sample data.
KEY TAKEAWAY: The $p$-value is NOT the probability that the null hypothesis is true. It is the probability of the data occurring, given that the null hypothesis is true.
The significance level ($\alpha$) is a pre-determined threshold used to decide whether the $p$-value is small enough to reject the null hypothesis. Common values for $\alpha$ are $0.05$ (5%) and $0.01$ (1%).
| Result | Conclusion |
|---|---|
| $p < 0.01$ | Very strong evidence against $H_0$ |
| \$0.01 \le p < 0.05$ | Strong evidence against $H_0$ |
| \$0.05 \le p < 0.10$ | Weak evidence against $H_0$ |
| $p \ge 0.10$ | Little to no evidence against $H_0$ |
EXAM TIP: When writing your conclusion in an exam, always relate it back to the context of the question. Don’t just say “Reject $H_0$”; say “Reject $H_0$. There is evidence at the 5% level to suggest that the mean heart rate of participants is higher than 70 bpm.”
The $p$-value is influenced by several components of the $z$-test calculation:
STUDY HINT: Remember the inverse relationship: A larger test statistic ($Z$) results in a smaller $p$-value.
Hypothesis testing is not infallible. Because we rely on samples, we can make two types of errors:
| Decision | $H_0$ is True | $H_0$ is False |
|---|---|---|
| Do not reject $H_0$ | Correct Decision ($1-\alpha$) | Type II Error ($\beta$) |
| Reject $H_0$ | Type I Error ($\alpha$) | Correct Decision (Power) ($1-\beta$) |
REMEMBER: To reduce the chance of a Type I error, decrease $\alpha$ (e.g., from 0.05 to 0.01). However, this will generally increase the chance of a Type II error unless the sample size is increased.