Statistical inference involves using data from a sample to make conclusions or predictions about an entire population. In Specialist Mathematics, this focuses on the distribution of the sample mean, the construction of confidence intervals, and the formal process of hypothesis testing.
Before analyzing sample means, it is essential to understand how multiple random variables interact.
If $X$ and $Y$ are independent random variables, and $a$ and $b$ are constants:
For the sum of $n$ independent observations of the same random variable $X$:
* $E(X_1 + X_2 + \dots + X_n) = n\mu$
* $Var(X_1 + X_2 + \dots + X_n) = n\sigma^2$
* $SD(X_1 + X_2 + \dots + X_n) = \sigma\sqrt{n}$
COMMON MISTAKE: Do not confuse $Var(2X)$ with $Var(X_1 + X_2)$.
$Var(2X) = 2^2Var(X) = 4\sigma^2$, whereas $Var(X_1 + X_2) = \sigma^2 + \sigma^2 = 2\sigma^2$. The latter represents the sum of two independent trials, which has less variability than simply doubling a single result.
The sample mean, denoted by $\bar{X}$, is a random variable formed by taking the average of $n$ independent observations:
$$\bar{X} = \frac{X_1 + X_2 + \dots + X_n}{n}$$
The Central Limit Theorem states that for a sufficiently large sample size (usually $n \ge 30$), the sampling distribution of the sample mean $\bar{X}$ will be approximately normally distributed, regardless of the shape of the original population distribution.
| Condition | Distribution of $\bar{X}$ |
|---|---|
| Population is Normal | $\bar{X}$ is exactly Normal: $\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)$ |
| Population is NOT Normal ($n \ge 30$) | $\bar{X}$ is approximately Normal: $\bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right)$ |
KEY TAKEAWAY: As the sample size $n$ increases, the standard deviation of the sample mean decreases (the distribution becomes narrower), meaning the sample mean becomes a more reliable estimate of the population mean.
A confidence interval (CI) provides a range of plausible values for the population mean $\mu$, based on a sample mean $\bar{x}$.
For a population with a known standard deviation $\sigma$:
$$\left( \bar{x} - z\frac{\sigma}{\sqrt{n}}, \bar{x} + z\frac{\sigma}{\sqrt{n}} \right)$$
Where $z$ is the critical value for the desired level of confidence.
| Confidence Level | $z$ value (approx.) |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 99% | 2.576 |
The Margin of Error ($M$) is half the width of the confidence interval:
$$M = z\frac{\sigma}{\sqrt{n}}$$
To find the required sample size $n$ for a specific margin of error:
$$n = \left( \frac{z\sigma}{M} \right)^2$$
EXAM TIP: If the population standard deviation $\sigma$ is unknown and the sample size is large ($n \ge 30$), you can use the sample standard deviation $s$ as an estimate for $\sigma$.
Hypothesis testing is a formal procedure used to determine whether there is enough evidence in a sample to support a particular claim about the population mean $\mu$.
VCAA FOCUS: Always state your conclusion in the context of the original problem. Don’t just say “Reject $H_0$”; say “There is sufficient evidence at the 5% level to suggest that the mean weight of the product has increased.”
Because we use sample data to make inferences, there is always a chance of reaching the wrong conclusion.
| Decision | $H_0$ is actually True | $H_0$ is actually False |
|---|---|---|
| Reject $H_0$ | Type I Error | Correct Decision |
| Do not reject $H_0$ | Correct Decision | Type II Error |
REMEMBER: To reduce the probability of both types of errors simultaneously, you must increase the sample size $n$. Decreasing $\alpha$ (the risk of a Type I error) will typically increase $\beta$ (the risk of a Type II error) if the sample size remains constant.