Statistical inference is the process of using data from a sample to make generalizations or draw conclusions about a population. In Specialist Mathematics, this focuses on the distribution of the sample mean, the construction of confidence intervals, and the formal process of hypothesis testing.
For a random variable $X$ with a population mean $\mu$ and a population standard deviation $\sigma$, the sample mean $\bar{X}$ of a random sample of size $n$ is itself a random variable.
The Central Limit Theorem states that for a sufficiently large sample size (generally $n \ge 30$), the distribution of the sample mean $\bar{X}$ will be approximately normal, regardless of the shape of the population distribution:
$$\bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right)$$
KEY TAKEAWAY: As the sample size $n$ increases, the standard error $\frac{\sigma}{\sqrt{n}}$ decreases, meaning the sample mean becomes a more precise estimator of the population mean.
A confidence interval provides a range of values within which the true population mean $\mu$ is likely to lie, based on a sample mean $\bar{x}$.
For a population with a known standard deviation $\sigma$ (or a large sample where the sample standard deviation $s$ can approximate $\sigma$):
$$\left( \bar{x} - z\frac{\sigma}{\sqrt{n}}, \bar{x} + z\frac{\sigma}{\sqrt{n}} \right)$$
Where $z$ is the critical value for the desired level of confidence.
| Confidence Level | $z$-score (approx.) |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 99% | 2.576 |
The margin of error is half the width of the confidence interval:
$$M = z\frac{\sigma}{\sqrt{n}}$$
To halve the margin of error, the sample size $n$ must be quadrupled.
EXAM TIP: If a question asks for the “width” of the confidence interval, it is \$2 \times M$. If you are asked to find the required sample size $n$ for a specific margin of error, always round $n$ up to the next integer.
Hypothesis testing is a formal procedure used to determine whether there is enough evidence in a sample of data to support a particular belief about the population mean.
To test the hypothesis, we calculate the $z$-score of the observed sample mean $\bar{x}$:
$$z^* = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$$
The $p$-value is the probability of obtaining a sample mean at least as extreme as the one observed, assuming the null hypothesis is true.
* For $H_1: \mu > \mu_0$, $p = P(Z > z^)$
* For $H_1: \mu < \mu_0$, $p = P(Z < z^)$
* For $H_1: \mu \neq \mu_0$, $p = 2 \times P(Z > |z^*|)$
VCAA FOCUS: If $p < \alpha$ (where $\alpha$ is the significance level, usually 0.05), we reject $H_0$. If $p \ge \alpha$, we fail to reject $H_0$. Always state your conclusion in the context of the original problem.
Because we are using a sample to infer properties of a population, there is always a risk of reaching the wrong conclusion.
| Decision | $H_0$ is True | $H_0$ is False ($H_1$ is True) |
|---|---|---|
| Reject $H_0$ | Type I Error | Correct Decision |
| Fail to Reject $H_0$ | Correct Decision | Type II Error |
COMMON MISTAKE: Students often think that decreasing $\alpha$ (the significance level) is always better. However, decreasing the probability of a Type I error ($\alpha$) automatically increases the probability of a Type II error ($\beta$), assuming the sample size remains constant.
This involves comparing the means of two independent populations, $X_1$ and $X_2$.
If $\bar{X}_1$ and $\bar{X}_2$ are the means of independent samples from two populations:
* Expected Value: $E(\bar{X}_1 - \bar{X}_2) = \mu_1 - \mu_2$
* Variance: $Var(\bar{X}_1 - \bar{X}_2) = \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$
$$(\bar{x}_1 - \bar{x}_2) \pm z \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$$
To test $H_0: \mu_1 = \mu_2$ (which is $\mu_1 - \mu_2 = 0$):
$$z^* = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$$
STUDY HINT: When dealing with two-sample problems, ensure you correctly identify whether the samples are independent. If the data is “paired” (e.g., before and after measurements on the same subjects), you should instead analyze the single mean of the differences, $D = X_1 - X_2$.