Statistical Inference: Hypothesis Testing

Hypothesis testing is a formal procedure used in statistics to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. It follows a logic similar to a legal trial: the “null hypothesis” (innocence) is assumed true until “sufficient evidence” (the sample data) proves otherwise.

1. Formulation of Hypotheses

Every statistical test involves two competing hypotheses:

Null Hypothesis (\(H_0\)): The statement that there is no effect or no change. It always involves an equality.
- Form: \(H_0: \mu = \mu_0\)
Alternative Hypothesis (\(H_1\)): The statement we are trying to find evidence for. It represents a change, difference, or effect.
- One-tailed (directional): \(H_1: \mu > \mu_0\) or \(H_1: \mu < \mu_0\)
- Two-tailed (non-directional): \(H_1: \mu \neq \mu_0\)

Summary of Hypothesis Forms

Test Type	Null Hypothesis (\(H_0\))	Alternative Hypothesis (\(H_1\))
Right-tailed	\(H_0: \mu = \mu_0\)	\(H_1: \mu > \mu_0\)
Left-tailed	\(H_0: \mu = \mu_0\)	\(H_1: \mu < \mu_0\)
Two-tailed	\(H_0: \mu = \mu_0\)	\(H_1: \mu \neq \mu_0\)

COMMON MISTAKE: Students often use the sample mean \(\bar{x}\) in their hypotheses. Remember, hypotheses are always statements about the population parameter (\(\mu\)), never the sample statistic.

2. The Test Statistic (\(z\))

To test the hypothesis about a population mean \(\mu\) where the population standard deviation \(\sigma\) is known, we calculate a test statistic. This value measures how many standard errors the observed sample mean \(\bar{x}\) is away from the hypothesised mean \(\mu_0\).

For a sample size \(n\), the test statistic \(Z\) is:

\[Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}\]

This formula assumes either the population is normally distributed or \(n\) is large enough (\(n \ge 30\)) for the Central Limit Theorem to apply.

VCAA FOCUS: Ensure you check the conditions for a \(z\)-test. You must know the population standard deviation \(\sigma\). If \(\sigma\) is unknown and the sample is large, Specialist Mathematics students typically use the sample standard deviation \(s\) as an estimate for \(\sigma\).

3. The \(p\)-value

The \(p\)-value is the probability of observing a sample statistic as extreme as, or more extreme than, the one observed, assuming that the null hypothesis is true.

Small \(p\)-value: Suggests the observed data is unlikely to have occurred by chance under \(H_0\), providing evidence against \(H_0\).
Large \(p\)-value: Suggests the observed data is consistent with \(H_0\).

Calculating \(p\)-values

Let \(Z_{calc}\) be the calculated test statistic from the sample data.

For \(H_1: \mu > \mu_0\): \(p = \text{Pr}(Z > Z_{calc})\)
For \(H_1: \mu < \mu_0\): \(p = \text{Pr}(Z < Z_{calc})\)
For \(H_1: \mu \neq \mu_0\): \(p = 2 \times \text{Pr}(Z > |Z_{calc}|)\)

KEY TAKEAWAY: The \(p\)-value is NOT the probability that the null hypothesis is true. It is the probability of the data occurring, given that the null hypothesis is true.

4. Significance Levels (\(\alpha\)) and Decision Making

The significance level (\(\alpha\)) is a pre-determined threshold used to decide whether the \(p\)-value is small enough to reject the null hypothesis. Common values for \(\alpha\) are \(0.05\) (5%) and \(0.01\) (1%).

The Decision Rule

If \(p < \alpha\): Reject \(H_0\). There is significant evidence to support \(H_1\).
If \(p \ge \alpha\): Do not reject \(H_0\). There is insufficient evidence to support \(H_1\).

Result	Conclusion
\(p < 0.01\)	Very strong evidence against \(H_0\)
\(0.01 \le p < 0.05\)	Strong evidence against \(H_0\)
\(0.05 \le p < 0.10\)	Weak evidence against \(H_0\)
\(p \ge 0.10\)	Little to no evidence against \(H_0\)

EXAM TIP: When writing your conclusion in an exam, always relate it back to the context of the question. Don’t just say “Reject \(H_0\)”; say “Reject \(H_0\). There is evidence at the 5% level to suggest that the mean heart rate of participants is higher than 70 bpm.”

5. Factors Affecting the \(p\)-value

The \(p\)-value is influenced by several components of the \(z\)-test calculation:

Sample Size (\(n\)): As \(n\) increases, the standard error (\(\sigma/\sqrt{n}\)) decreases. This makes the \(Z\) statistic larger (more extreme), which decreases the \(p\)-value.
Difference (\(|\bar{x} - \mu_0|\)): As the difference between the observed mean and hypothesised mean increases, the \(Z\) statistic becomes more extreme, which decreases the \(p\)-value.
Population Standard Deviation (\(\sigma\)): As \(\sigma\) decreases, the standard error decreases, making the \(Z\) statistic more extreme and decreasing the \(p\)-value.

STUDY HINT: Remember the inverse relationship: A larger test statistic (\(Z\)) results in a smaller \(p\)-value.

6. Type I and Type II Errors

Hypothesis testing is not infallible. Because we rely on samples, we can make two types of errors:

Type I Error: Occurs when we reject \(H_0\) when it is actually true (a “false positive”).
- The probability of a Type I error is equal to the significance level \(\alpha\).
Type II Error: Occurs when we fail to reject \(H_0\) when it is actually false (a “false negative”).
- The probability of a Type II error is denoted by \(\beta\).

Decision	\(H_0\) is True	\(H_0\) is False
Do not reject \(H_0\)	Correct Decision (\(1-\alpha\))	Type II Error (\(\beta\))
Reject \(H_0\)	Type I Error (\(\alpha\))	Correct Decision (Power) (\(1-\beta\))

REMEMBER: To reduce the chance of a Type I error, decrease \(\alpha\) (e.g., from 0.05 to 0.01). However, this will generally increase the chance of a Type II error unless the sample size is increased.

Statistical Inference: Hypothesis Testing

Table of Contents

About these notes

Join StudyPulse