Further Statistical Inference

Statistical inference is the process of using data from a sample to make generalizations or draw conclusions about a population. In Specialist Mathematics, this focuses on the distribution of the sample mean, the construction of confidence intervals, and the formal process of hypothesis testing.

1. The Distribution of the Sample Mean

For a random variable $X$ with a population mean $\mu$ and a population standard deviation $\sigma$, the sample mean $\bar{X}$ of a random sample of size $n$ is itself a random variable.

Key Properties:

Expected Value: $E(\bar{X}) = \mu$
Variance: $Var(\bar{X}) = \frac{\sigma^2}{n}$
Standard Deviation (Standard Error): $SD(\bar{X}) = \frac{\sigma}{\sqrt{n}}$

The Central Limit Theorem (CLT)

The Central Limit Theorem states that for a sufficiently large sample size (generally $n \ge 30$), the distribution of the sample mean $\bar{X}$ will be approximately normal, regardless of the shape of the population distribution:
\$$\bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right)$\$

KEY TAKEAWAY: As the sample size $n$ increases, the standard error $\frac{\sigma}{\sqrt{n}}$ decreases, meaning the sample mean becomes a more precise estimator of the population mean.

2. Confidence Intervals for the Population Mean

A confidence interval provides a range of values within which the true population mean $\mu$ is likely to lie, based on a sample mean $\bar{x}$.

Formula for a Confidence Interval

For a population with a known standard deviation $\sigma$ (or a large sample where the sample standard deviation $s$ can approximate $\sigma$):
\$$\left( \bar{x} - z\frac{\sigma}{\sqrt{n}}, \bar{x} + z\frac{\sigma}{\sqrt{n}} \right)$\$

Where $z$ is the critical value for the desired level of confidence.

Common Critical Values ($z$)

Confidence Level	$z$-score (approx.)
90%	1.645
95%	1.96
99%	2.576

Margin of Error ($M$)

The margin of error is half the width of the confidence interval:
\$$M = z\frac{\sigma}{\sqrt{n}}$\$
To halve the margin of error, the sample size $n$ must be quadrupled.

EXAM TIP: If a question asks for the “width” of the confidence interval, it is $2 \times M$. If you are asked to find the required sample size $n$ for a specific margin of error, always round $n$ up to the next integer.

3. Hypothesis Testing for a Mean

Hypothesis testing is a formal procedure used to determine whether there is enough evidence in a sample of data to support a particular belief about the population mean.

The Null and Alternative Hypotheses

Null Hypothesis ($H_0$): The default assumption (e.g., $H_0: \mu = \mu_0$). It usually states that there is no change or no effect.
Alternative Hypothesis ($H_1$ or $H_a$): What we are testing for.
- One-tailed (directional): $H_1: \mu > \mu_0$ or $H_1: \mu < \mu_0$
- Two-tailed (non-directional): $H_1: \mu \neq \mu_0$

The Test Statistic

To test the hypothesis, we calculate the $z$-score of the observed sample mean $\bar{x}$:
\$$z^* = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$\$

The $p$-value

The $p$-value is the probability of obtaining a sample mean at least as extreme as the one observed, assuming the null hypothesis is true.
* For $H_1: \mu > \mu_0$, $p = P(Z > z^*)$
* For $H_1: \mu < \mu_0$, $p = P(Z < z^*)$
* For $H_1: \mu \neq \mu_0$, $p = 2 \times P(Z > |z^*|)$

VCAA FOCUS: If $p < \alpha$ (where $\alpha$ is the significance level, usually 0.05), we reject $H_0$. If $p \ge \alpha$, we fail to reject $H_0$. Always state your conclusion in the context of the original problem.

4. Errors in Hypothesis Testing

Because we are using a sample to infer properties of a population, there is always a risk of reaching the wrong conclusion.

Decision	$H_0$ is True	$H_0$ is False ($H_1$ is True)
Reject $H_0$	Type I Error	Correct Decision
Fail to Reject $H_0$	Correct Decision	Type II Error

Type I Error

Occurs when we reject $H_0$ even though it is true.
The probability of a Type I error is equal to the significance level $\alpha$.
$P(\text{Type I Error}) = \alpha$

Type II Error

Occurs when we fail to reject $H_0$ even though it is false.
The probability is denoted by $\beta$.
To calculate $\beta$, you must be given a specific value for the “true” mean under the alternative hypothesis.

COMMON MISTAKE: Students often think that decreasing $\alpha$ (the significance level) is always better. However, decreasing the probability of a Type I error ($\alpha$) automatically increases the probability of a Type II error ($\beta$), assuming the sample size remains constant.

5. Statistical Inference for Two Means

This involves comparing the means of two independent populations, $X_1$ and $X_2$.

Distribution of the Difference Between Means

If $\bar{X}_1$ and $\bar{X}_2$ are the means of independent samples from two populations:
* Expected Value: $E(\bar{X}_1 - \bar{X}_2) = \mu_1 - \mu_2$
* Variance: $Var(\bar{X}_1 - \bar{X}_2) = \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}$

Confidence Interval for $\mu_1 - \mu_2$

\[(\bar{x}_1 - \bar{x}_2) \pm z \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\]

Hypothesis Testing for $\mu_1 - \mu_2$

To test $H_0: \mu_1 = \mu_2$ (which is $\mu_1 - \mu_2 = 0$):
\$$z^* = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$\$

STUDY HINT: When dealing with two-sample problems, ensure you correctly identify whether the samples are independent. If the data is “paired” (e.g., before and after measurements on the same subjects), you should instead analyze the single mean of the differences, $D = X_1 - X_2$.

Further Statistical Inference

Table of Contents

About these notes

Join StudyPulse