Sampling Distributions

Why Sampling Distributions Matter

Every time we take a sample, we get a different value of the sample mean $\bar{X}$. The sampling distribution of $\bar{X}$ describes how these values vary across all possible samples.

Sampling Distribution of the Sample Mean

If $X_1, X_2, \ldots, X_n$ are iid from a population with mean $\mu$ and variance $\sigma^2$:

$$E(\bar{X}) = \mu, \qquad \text{Var}(\bar{X}) = \frac{\sigma^2}{n}, \qquad \text{SD}(\bar{X}) = \frac{\sigma}{\sqrt{n}}$$

Central Limit Theorem: For sufficiently large $n$ (typically $n \geq 30$):
$$\bar{X} \approx N!\left(\mu, \frac{\sigma^2}{n}\right)$$

This holds regardless of the shape of the original population distribution.

If the population is exactly normal, then $\bar{X}$ is exactly normal for any $n$.

Standardising the Sample Mean

$$Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0,1) \quad \text{(approximately, for large } n\text{)}$$

Example 1: Heights in a population: $\mu = 170$ cm, $\sigma = 10$ cm. Sample of $n = 25$.

$\bar{X} \sim N!\left(170, \dfrac{100}{25}\right) = N(170, 4)$. So $\text{SD}(\bar{X}) = 2$ cm.

$P(\bar{X} > 173) = P!\left(Z > \dfrac{173-170}{2}\right) = P(Z > 1.5) = 1 - \Phi(1.5) \approx 0.0668$.

Sampling Distribution of a Proportion

For a binary variable with population proportion $p$, the sample proportion $\hat{P} = X/n$ (where $X \sim \text{Binomial}(n,p)$):

$$E(\hat{P}) = p, \qquad \text{Var}(\hat{P}) = \frac{p(1-p)}{n}$$

For large $n$ (rule: $np \geq 10$ and $n(1-p) \geq 10$):
$$\hat{P} \approx N!\left(p, \frac{p(1-p)}{n}\right)$$

Example 2: A coin has $p = 0.5$. In $n = 100$ flips, find $P(\hat{P} > 0.55)$.

$\text{SD}(\hat{P}) = \sqrt{\dfrac{0.5\times0.5}{100}} = 0.05$.

$P(\hat{P} > 0.55) = P!\left(Z > \dfrac{0.55-0.5}{0.05}\right) = P(Z > 1) \approx 0.159$.

Effect of Sample Size

Larger $n$ reduces $\text{SD}(\bar{X}) = \sigma/\sqrt{n}$: sample means cluster more tightly around $\mu$.
To halve the standard error, quadruple the sample size.
The sampling distribution becomes more concentrated (narrower) as $n$ increases.

Bias and Variance

An estimator $T$ is unbiased for parameter $\theta$ if $E(T) = \theta$.

$\bar{X}$ is an unbiased estimator of $\mu$; $\hat{P}$ is an unbiased estimator of $p$. \checkmark

KEY TAKEAWAY: The sampling distribution of $\bar{X}$ is centred at $\mu$ with standard deviation $\sigma/\sqrt{n}$. Larger samples give more precise estimates. The CLT justifies using normal probability calculations for inference.

EXAM TIP: When computing probabilities about $\bar{X}$, divide $\sigma$ by $\sqrt{n}$ to get the standard error before standardising. Using $\sigma$ instead of $\sigma/\sqrt{n}$ is the most common error.

COMMON MISTAKE: Applying the CLT to proportions without checking $np \geq 10$ and $n(1-p) \geq 10$. If these fail, the normal approximation is unreliable.

Sampling Distributions

Table of Contents

About these notes

Join StudyPulse