A residual is the difference between the actual (observed) value and the predicted value from the regression line:
$$\text{residual} = y - \hat{y} = \text{actual} - \text{predicted}$$
Equation: $\widehat{\text{score}} = 31.5 + 9.2 \times \text{hours}$
| Hours | Actual score | Predicted $\hat{y}$ | Residual |
|---|---|---|---|
| 3 | 58 | 59.1 | $-1.1$ |
| 5 | 82 | 77.5 | $+4.5$ |
| 7 | 95 | 95.9 | $-0.9$ |
A residual plot graphs residuals ($y - \hat{y}$) on the y-axis against the explanatory variable ($x$) on the x-axis.
| Pattern in residual plot | Conclusion |
|---|---|
| Random scatter around the zero line | Linear model is appropriate |
| Curved pattern (U-shape or arch) | Linear model is NOT appropriate; try a non-linear model |
| Fan shape (spread increases) | Heteroscedasticity; model assumptions violated |
| One extreme point | Outlier; investigate |
Residual
+4 | × ×
+2 | ×
0 |----×---------×------
-2 | × ×
-4 | ×
+-------------------> x
Points randomly scattered above and below zero — no pattern.
Residual
+4 | × ×
+2 | × × × ×
0 |--------×-----------
-2 | ×
-4
+-------------------> x
Clear U-shape — the linear model is NOT appropriate.
Step 1: Calculate residuals for all data points
Step 2: Plot residuals against $x$
Step 3: Look for patterns
Step 4: Conclude whether linear model is appropriate
For the least squares line, the sum of residuals always equals zero:
$$\sum(y - \hat{y}) = 0$$
This is a mathematical property of the least squares method.
KEY TAKEAWAY: A random scatter in the residual plot confirms a linear model is appropriate. Any systematic pattern (curve, fan shape) means the linear model should not be used.
EXAM TIP: VCAA commonly shows a residual plot and asks you to comment on the appropriateness of the linear model. Describe the pattern you see and state your conclusion clearly.
COMMON MISTAKE: Confusing a residual plot that looks “messy” (random scatter = good!) with one that has a clear pattern (bad). Random scatter is actually what you want to see.