The least squares line (regression line) is the straight line that best fits a set of bivariate data by minimising the sum of squared residuals (vertical distances from each point to the line).
$$\hat{y} = a + bx$$
Where:
- $\hat{y}$ = predicted value of the response variable
- $x$ = value of the explanatory variable
- $b$ = slope (gradient)
- $a$ = y-intercept
$$b = r \cdot \frac{s_y}{s_x}$$
$$a = \bar{y} - b\bar{x}$$
Where:
- $r$ = correlation coefficient
- $s_y$ = standard deviation of $y$
- $s_x$ = standard deviation of $x$
- $\bar{x}, \bar{y}$ = means of $x$ and $y$
In practice, use a CAS calculator (LinReg) to obtain $a$ and $b$ directly.
“For each one unit increase in [x variable], the predicted [y variable] increases/decreases by $b$ [units].”
Example: If $b = 8.3$ and $x$ = hours studied, $y$ = exam score:
“For each additional hour of study, the predicted exam score increases by 8.3 marks.”
If $b < 0$:
“For each additional [unit of x], the predicted [y] decreases by $|b|$ [units].”
“When [x variable] = 0, the predicted [y variable] is $a$ [units].”
Note: The intercept may not always be meaningful in context. If $x = 0$ is outside the range of the data, interpret with caution.
Example: If $a = 32.4$:
“When a student studies 0 hours, the predicted exam score is 32.4 marks.”
(This may or may not be sensible depending on context.)
Data: hours studied ($x$) and exam score ($y$)
CAS output: $a = 31.5$, $b = 9.2$, $r = 0.93$
Equation: $\hat{y} = 31.5 + 9.2x$
Slope interpretation: For each additional hour studied, the predicted exam score increases by 9.2 marks.
Intercept interpretation: A student who studied 0 hours is predicted to score 31.5 marks.
Prediction: If $x = 4$ hours: $\hat{y} = 31.5 + 9.2(4) = 31.5 + 36.8 = 68.3$ marks.
The least squares line always passes through the point of means $(\bar{x}, \bar{y})$. This is a useful check.
KEY TAKEAWAY: The least squares line minimises the sum of squared residuals. Interpret the slope in context (change in y per unit change in x) and the intercept as the predicted y when x = 0.
EXAM TIP: Always write the equation with the actual variable names, not just $x$ and $y$. E.g., $\widehat{\text{score}} = 31.5 + 9.2 \times \text{hours}$.
COMMON MISTAKE: Confusing slope and intercept interpretations. The slope is the rate of change; the intercept is the starting value when x = 0.