In Specialist Mathematics, statistical inference involves using data from a sample to make generalizations or decisions about an entire population. The foundation of this process is the sampling distribution.
To understand sampling distributions, we must distinguish between the group we want to study and the group we actually measure.
| Feature | Population (Parameter) | Sample (Statistic) |
|---|---|---|
| Mean | $\mu$ | $\bar{x}$ |
| Standard Deviation | $\sigma$ | $s$ |
| Size | $N$ | $n$ |
KEY TAKEAWAY: Parameters are fixed values belonging to the population, while statistics are random variables that change depending on which sample is selected.
When we take a random sample of size $n$ from a population, the sample mean $\bar{X}$ is considered a random variable.
If we were to take every possible sample of size $n$ from a population and calculate the mean of each, the resulting distribution of those means is called the sampling distribution of the sample mean.
STUDY HINT: Remember that $\bar{X}$ (uppercase) refers to the random variable/distribution, while $\bar{x}$ (lowercase) refers to a specific numerical value calculated from a single observed sample.
Regardless of the shape of the parent population, the sampling distribution of $\bar{X}$ has specific properties related to the population parameters $\mu$ and $\sigma$.
The expected value of the sample mean is equal to the population mean:
$$E(\bar{X}) = \mu$$
The standard deviation of the sample mean (often called the standard error) measures the variability of the sample means around the population mean:
$$SD(\bar{X}) = \frac{\sigma}{\sqrt{n}}$$
As the sample size $n$ increases, the standard error decreases, meaning the sample mean becomes a more precise estimate of the population mean.
COMMON MISTAKE: Students often forget to divide by $\sqrt{n}$ when calculating the standard deviation for a sample mean. Always check if the question refers to a single observation $X$ or the mean of a sample $\bar{X}$.
The Central Limit Theorem is the most vital concept in statistical inference for Specialist Mathematics.
Definition:
For a large sample size ($n$ is sufficiently large), the sampling distribution of the sample mean $\bar{X}$ will be approximately normally distributed, regardless of the shape of the original population distribution.
If $n$ is large:
$$\bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right)$$
VCAA FOCUS: VCAA often tests whether students can identify when the CLT is required. If the population distribution is unknown or non-normal, you must invoke the CLT to justify using normal distribution calculations for the sample mean.
We use the properties of the sampling distribution to calculate probabilities associated with sample means.
To find probabilities using the standard normal distribution ($Z \sim N(0,1)$), we use the transformation:
$$Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$$
EXAM TIP: When writing your working, clearly state the distribution you are using. For example: “By the Central Limit Theorem, since $n=50$, $\bar{X} \sim N(\mu, \frac{\sigma^2}{n})$ approximately.” This earns marks for demonstrating understanding of the theory.
| Population Distribution | Sample Size ($n$) | Shape of Sampling Distribution ($\bar{X}$) |
|---|---|---|
| Normal | Small ($n < 30$) | Normal |
| Normal | Large ($n \ge 30$) | Normal |
| Non-Normal | Small ($n < 30$) | Unknown (similar to population) |
| Non-Normal | Large ($n \ge 30$) | Approximately Normal (CLT) |
APPLICATION: Sampling distributions are the reason we can predict the accuracy of political polls or quality control tests in manufacturing. By knowing how $\bar{X}$ behaves, we can quantify our uncertainty about the population mean $\mu$.