Sampling Distributions and Statistical Inference

In Specialist Mathematics, statistical inference involves using data from a sample to make generalizations or decisions about an entire population. The foundation of this process is the sampling distribution.

1. Population vs. Sample

To understand sampling distributions, we must distinguish between the group we want to study and the group we actually measure.

Population: The entire set of individuals or objects about which information is sought.
Sample: A subset of the population used to estimate population characteristics.
Parameter: A numerical characteristic of a population (e.g., population mean $\mu$, population standard deviation $\sigma$). These are usually constant but unknown.
Statistic: A numerical characteristic of a sample (e.g., sample mean $\bar{x}$, sample standard deviation $s$). These vary from sample to sample.

Feature	Population (Parameter)	Sample (Statistic)
Mean	$\mu$	$\bar{x}$
Standard Deviation	$\sigma$	$s$
Size	$N$	$n$

KEY TAKEAWAY: Parameters are fixed values belonging to the population, while statistics are random variables that change depending on which sample is selected.

2. The Sample Mean as a Random Variable

When we take a random sample of size $n$ from a population, the sample mean $\bar{X}$ is considered a random variable.

If we were to take every possible sample of size $n$ from a population and calculate the mean of each, the resulting distribution of those means is called the sampling distribution of the sample mean.

Notation

$X_1, X_2, \dots, X_n$ are independent and identically distributed (i.i.d.) random variables, each representing a single observation from the population.
The sample mean is defined as:
\$$\bar{X} = \frac{X_1 + X_2 + \dots + X_n}{n}$\$

STUDY HINT: Remember that $\bar{X}$ (uppercase) refers to the random variable/distribution, while $\bar{x}$ (lowercase) refers to a specific numerical value calculated from a single observed sample.

3. Properties of the Sampling Distribution

Regardless of the shape of the parent population, the sampling distribution of $\bar{X}$ has specific properties related to the population parameters $\mu$ and $\sigma$.

The Expected Value (Mean)

The expected value of the sample mean is equal to the population mean:
\$$E(\bar{X}) = \mu$\$

The Standard Deviation (Standard Error)

The standard deviation of the sample mean (often called the standard error) measures the variability of the sample means around the population mean:
\$$SD(\bar{X}) = \frac{\sigma}{\sqrt{n}}$\$

As the sample size $n$ increases, the standard error decreases, meaning the sample mean becomes a more precise estimate of the population mean.

COMMON MISTAKE: Students often forget to divide by $\sqrt{n}$ when calculating the standard deviation for a sample mean. Always check if the question refers to a single observation $X$ or the mean of a sample $\bar{X}$.

4. The Central Limit Theorem (CLT)

The Central Limit Theorem is the most vital concept in statistical inference for Specialist Mathematics.

Definition:
For a large sample size ($n$ is sufficiently large), the sampling distribution of the sample mean $\bar{X}$ will be approximately normally distributed, regardless of the shape of the original population distribution.

Conditions for CLT:

If the population is already normally distributed, $\bar{X}$ is exactly normally distributed for any sample size $n$.
If the population is not normally distributed (e.g., uniform, binomial, or skewed), $\bar{X}$ is approximately normally distributed provided $n$ is “large enough.”
In VCE Specialist Mathematics, $n \ge 30$ is generally considered the threshold for the CLT to apply.

Resulting Distribution:

If $n$ is large:
\$$\bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right)$\$

VCAA FOCUS: VCAA often tests whether students can identify when the CLT is required. If the population distribution is unknown or non-normal, you must invoke the CLT to justify using normal distribution calculations for the sample mean.

5. Using the Sampling Distribution for Inference

We use the properties of the sampling distribution to calculate probabilities associated with sample means.

Standardizing the Sample Mean

To find probabilities using the standard normal distribution ($Z \sim N(0,1)$), we use the transformation:
\$$Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$\$

Calculation Steps:

Identify population parameters $\mu$ and $\sigma$.
Identify sample size $n$.
Determine the distribution of $\bar{X}$ (Normal if population is normal, or approx. Normal via CLT if $n \ge 30$).
Calculate $SD(\bar{X}) = \frac{\sigma}{\sqrt{n}}$.
Use a CAS calculator or $Z$-scores to find the required probability $P(\bar{X} < k)$ or $P(\bar{X} > k)$.

EXAM TIP: When writing your working, clearly state the distribution you are using. For example: “By the Central Limit Theorem, since $n=50$, $\bar{X} \sim N(\mu, \frac{\sigma^2}{n})$ approximately.” This earns marks for demonstrating understanding of the theory.

6. Summary of Distribution Shapes

Population Distribution	Sample Size ($n$)	Shape of Sampling Distribution ($\bar{X}$)
Normal	Small ($n < 30$)	Normal
Normal	Large ($n \ge 30$)	Normal
Non-Normal	Small ($n < 30$)	Unknown (similar to population)
Non-Normal	Large ($n \ge 30$)	Approximately Normal (CLT)

APPLICATION: Sampling distributions are the reason we can predict the accuracy of political polls or quality control tests in manufacturing. By knowing how $\bar{X}$ behaves, we can quantify our uncertainty about the population mean $\mu$.

Sampling Distributions and Statistical Inference

Table of Contents

About these notes

Join StudyPulse