A multi-layer perceptron (MLP) is a feedforward neural network with at least three layers. Information flows in one direction: input to hidden to output.
| Layer | Role | Neurons |
|---|---|---|
| Input layer | Receives feature values; no computation | One per feature |
| Hidden layer(s) | Transform inputs into useful representations | Defined by designer |
| Output layer | Produces the final prediction | One per class (classification) |
Fully connected (dense): Every neuron in each layer connects to every neuron in the next layer.
For a network with $d$ inputs, $h$ hidden neurons (one hidden layer), $k$ outputs:
Input layer Hidden layer Output layer
x_1 -----\
\--> h_1 --\
x_2 ---------> h_2 ----> output y_1
\--> h_3 --/
x_3 -----/
Each arrow represents a weight.
For hidden neuron $j$ in the first hidden layer:
$$z_j^{(1)} = \sum_{i=1}^{d} w_{ji}^{(1)} x_i + b_j^{(1)}$$
$$a_j^{(1)} = \sigma\bigl(z_j^{(1)}\bigr)$$
In matrix form for the entire hidden layer:
$$\mathbf{z}^{(1)} = W^{(1)} \mathbf{x} + \mathbf{b}^{(1)}, \qquad \mathbf{a}^{(1)} = \sigma\bigl(\mathbf{z}^{(1)}\bigr)$$
Weights:
$$W^{(1)} = \begin{pmatrix} 1 & 2 \ -1 & 0 \end{pmatrix}, \quad \mathbf{b}^{(1)} = \begin{pmatrix} 0 \ 1 \end{pmatrix}, \quad W^{(2)} = (1,\; -1), \quad b^{(2)} = 0$$
Input: $\mathbf{x} = (1, 1)$
Hidden pre-activations:
$$z_1 = 1 \cdot 1 + 2 \cdot 1 + 0 = 3, \quad z_2 = -1 \cdot 1 + 0 \cdot 1 + 1 = 0$$
Hidden activations (sigmoid):
$$a_1 = \sigma(3) \approx 0.953, \quad a_2 = \sigma(0) = 0.5$$
Output:
$$z^{(2)} = 1 \cdot 0.953 + (-1) \cdot 0.5 + 0 = 0.453$$
$$\hat{y} = \sigma(0.453) \approx 0.611$$
KEY TAKEAWAY: An MLP processes information by sequentially applying weighted sums and activation functions layer by layer. Each hidden layer learns a higher-level representation of the input.
Without non-linear activations, any MLP collapses to a single linear transformation regardless of depth. Non-linear activations (sigmoid, ReLU) are what give deep networks their expressive power.
EXAM TIP: VCAA will ask you to compute forward propagation on a small MLP. Practice the full calculation: compute $z$ for each neuron, apply $\sigma$, pass results to the next layer. Show all working.
VCAA FOCUS: Describe the layered structure (input, hidden, output). Explain the role of weights and biases. Know how to perform forward propagation through a small network.