A residual is the difference between the actual (observed) value and the predicted value from the regression line:
Equation: \(\widehat{\text{score}} = 31.5 + 9.2 \times \text{hours}\)
| Hours | Actual score | Predicted \(\hat{y}\) | Residual |
|---|---|---|---|
| 3 | 58 | 59.1 | \(-1.1\) |
| 5 | 82 | 77.5 | \(+4.5\) |
| 7 | 95 | 95.9 | \(-0.9\) |
A residual plot graphs residuals (\(y - \hat{y}\)) on the y-axis against the explanatory variable (\(x\)) on the x-axis.
| Pattern in residual plot | Conclusion |
|---|---|
| Random scatter around the zero line | Linear model is appropriate |
| Curved pattern (U-shape or arch) | Linear model is NOT appropriate; try a non-linear model |
| Fan shape (spread increases) | Heteroscedasticity; model assumptions violated |
| One extreme point | Outlier; investigate |
Residual
+4 | × ×
+2 | ×
0 |----×---------×------
-2 | × ×
-4 | ×
+-------------------> x
Points randomly scattered above and below zero — no pattern.
Residual
+4 | × ×
+2 | × × × ×
0 |--------×-----------
-2 | ×
-4
+-------------------> x
Clear U-shape — the linear model is NOT appropriate.
Step 1: Calculate residuals for all data points
Step 2: Plot residuals against \(x\)
Step 3: Look for patterns
Step 4: Conclude whether linear model is appropriate
For the least squares line, the sum of residuals always equals zero:
This is a mathematical property of the least squares method.
KEY TAKEAWAY: A random scatter in the residual plot confirms a linear model is appropriate. Any systematic pattern (curve, fan shape) means the linear model should not be used.
EXAM TIP: VCAA commonly shows a residual plot and asks you to comment on the appropriateness of the linear model. Describe the pattern you see and state your conclusion clearly.
COMMON MISTAKE: Confusing a residual plot that looks “messy” (random scatter = good!) with one that has a clear pattern (bad). Random scatter is actually what you want to see.