PageRank Algorithm - StudyPulse
Boost Your VCE Scores Today with StudyPulse
8000+ Questions AI Tutor Help
Home Subjects Algorithmics (HESS) PageRank algorithm

PageRank Algorithm

Algorithmics (HESS)
StudyPulse

PageRank Algorithm

Algorithmics (HESS)
01 May 2026

PageRank Algorithm: Estimating Node Importance

What Is PageRank?

PageRank is an algorithm developed by Larry Page and Sergey Brin (co-founders of Google) for estimating the importance (rank) of nodes in a directed graph based on the structure of incoming links.

Core idea: A page is important if many important pages link to it. This is a recursive notion of importance.

KEY TAKEAWAY: PageRank models a “random surfer” who follows links at random. A page’s rank is proportional to the probability that the random surfer is visiting it at any given moment.


Problem Specification

Input: A directed graph $G = (V, E)$ where nodes are web pages and directed edges represent hyperlinks
Output: A score $PR(v)$ for each vertex, representing its relative importance
Interpretation: Higher $PR(v)$ → more “important” node


The PageRank Formula

For a vertex $v$ with $n$ total vertices:

$$PR(v) = \frac{1-d}{n} + d \sum_{u \in \text{in}(v)} \frac{PR(u)}{\text{out}(u)}$$

Where:
- $d$ is the damping factor (typically 0.85) — probability of following a link
- \$1 - d$ = probability of jumping to a random page (teleportation)
- $\text{in}(v)$ = set of vertices with edges pointing to $v$
- $\text{out}(u)$ = out-degree of vertex $u$ (number of links from $u$)


Intuition: The Random Surfer Model

Imagine a web surfer who:
- With probability $d = 0.85$: clicks a random link on the current page
- With probability \$1 - d = 0.15$: jumps to a completely random page

The PageRank of a page is the long-run fraction of time the surfer spends on that page.


Iterative Computation

PageRank is computed iteratively until convergence:

ALGORITHM PageRank(graph, d, iterations)
    n  number of vertices
    PR  Dictionary (all vertices  1/n)   // initialise equally

    for iter  1 to iterations do
        newPR  Dictionary
        for each vertex v in graph do
            rank_sum  0
            for each u in inNeighbours(v) do
                rank_sum  rank_sum + PR[u] / outDegree(u)
            end for
            newPR[v]  (1 - d) / n + d * rank_sum
        end for
        PR  newPR
    end for
    return PR

Worked Example

3-page graph: A→B, A→C, B→C, C→A. $d = 0.85$, $n = 3$.

Initial: $PR(A) = PR(B) = PR(C) = 1/3$

Iteration 1:
- $PR(A) = 0.05 + 0.85 \times (PR(C)/1) = 0.05 + 0.85 \times 0.333 = 0.333$
- $PR(B) = 0.05 + 0.85 \times (PR(A)/2) = 0.05 + 0.85 \times 0.167 = 0.192$
- $PR(C) = 0.05 + 0.85 \times (PR(A)/2 + PR(B)/1) = 0.05 + 0.85 \times (0.167 + 0.333) = 0.475$

C has the highest rank — it receives links from both A and B.


Correctness and Convergence

PageRank corresponds to the stationary distribution of a Markov chain over the graph. With the damping factor:
- Every page has a non-zero probability of being visited
- The process is guaranteed to converge to a unique stationary distribution


Limitations

Limitation Explanation
Link spam Pages can inflate their rank by creating many inbound links
Dangling nodes Pages with no outgoing links can absorb rank (handled by teleportation)
Not semantic Measures link structure, not content quality
Dynamic web Web changes constantly; PageRank must be recomputed

EXAM TIP: In VCE exams, PageRank questions typically ask you to interpret the formula, trace one or two iterations, or justify why a node has a high/low rank. Focus on understanding the formula components: damping factor, in-neighbours, and out-degrees.

Table of Contents