Neural networks were proposed in the 1940s but fell out of favour twice in what are called ‘AI winters’. Their resurgence, beginning in the late 1980s and accelerating through the 1990s and 2000s, was driven by several converging factors.
| Year | Event |
|---|---|
| 1943 | McCulloch-Pitts mathematical neuron model |
| 1957 | Rosenblatt’s Perceptron — early enthusiasm |
| 1969 | Minsky and Papert’s Perceptrons proves single-layer networks cannot solve XOR; funding cut |
| 1970s–1980s | First AI winter: research interest and funding decline |
Rumelhart, Hinton, and Williams popularised backpropagation as a practical algorithm for training multi-layer networks. This solved the credit assignment problem — how to compute the contribution of each weight deep in the network to the output error.
Multi-layer networks could now learn to solve non-linearly separable problems (e.g. XOR), overcoming the limitation identified by Minsky and Papert.
KEY TAKEAWAY: Backpropagation made it computationally feasible to train multi-layer networks. It was the foundational theoretical breakthrough that ended the first AI winter for neural networks.
The exponential growth of CPU (and later GPU) performance (Moore’s Law) made it practical to train larger networks on more data.
GPUs, designed for graphics, are exceptionally well-suited to the matrix multiplications required in neural network training, providing 10-100x speedups over CPUs. By the mid-2000s, training that would have taken months in 1986 took hours.
Machine learning algorithms improve with more data. The growth of the internet in the 1990s and 2000s produced vast digitised datasets:
- ImageNet: 14 million labelled images
- Linguistic corpora: billions of words
- Medical imaging databases
Without sufficient training data, even powerful models overfit or fail to generalise.
Beyond backpropagation:
| Innovation | Year | Contribution |
|---|---|---|
| Convolutional Neural Networks (CNNs) | 1989 (LeCun) | Efficient image processing |
| Long Short-Term Memory (LSTM) | 1997 (Hochreiter & Schmidhuber) | Sequence modelling (text, time series) |
| ReLU activation function | 2000s | Solved the vanishing gradient problem |
| Dropout regularisation | 2012 | Reduced overfitting |
AlexNet (Krizhevsky, Sutskever, Hinton) won the ImageNet Large Scale Visual Recognition Challenge by a large margin, reducing the error rate from 26% to 16%. This dramatic demonstration triggered the modern deep learning era.
| Factor | Why it mattered |
|---|---|
| Backpropagation | Enabled training of multi-layer networks |
| Faster hardware (GPUs) | Made large-scale training practical |
| Large datasets | Provided enough data for models to generalise |
| Algorithmic improvements | CNNs, LSTMs, ReLU, dropout |
STUDY HINT: For VCAA, focus on four main factors: backpropagation, hardware (GPUs / Moore’s Law), large datasets, and algorithmic improvements. Each factor alone was insufficient — their convergence caused the resurgence.
EXAM TIP: VCAA may ask you to list and briefly explain these factors. Name specific examples: Rumelhart, Hinton, Williams (backpropagation); Moore’s Law / GPUs (hardware); ImageNet (datasets); CNNs and LSTMs (algorithmic improvements).
VCAA FOCUS: Know the four major factors and explain why each was significant for making neural networks practical. Understand the historical context (AI winters and the resurgence).