M 09 · 9.1.1 · conceptual

9.1.1 Perceptron from Scratch

Build Rosenblatt's 1958 perceptron in NumPy: a linear weighted sum, a step activation, and the weight-update rule that adjusts inputs after every wrong prediction. Train it as a binary classifier on a 2D dataset.

Duration30–35 min

Levelintermediate

Load3 core concepts

PrereqsBasic NumPy, dot products, linear algebra intuition

Big question

What is the simplest machine that can learn to classify data? Frank Rosenblatt’s 1958 perceptron is the answer — and it is so simple you can implement it in thirty lines of NumPy. Three operations: multiply inputs by weights, sum the products, threshold the result. The novel addition was a learning rule: after every wrong prediction, nudge each weight in the direction that would have produced the right answer. Run this loop on linearly separable data and the weights converge to a hyperplane that splits the two classes [1, 2]. The same weight-update logic, generalised through calculus, runs inside every modern neural network with billions of parameters.

Learning objectives

State the perceptron architecture: weighted sum, bias, step activation.
Implement the forward pass with np.dot and a step function.
Apply the perceptron learning rule: w ← w + lr × (y_true − y_pred) × x.
Visualise the learned decision boundary as a line in 2D feature space.
Recognise the XOR limitation that motivated multi-layer networks.

Part 1 — Architecture of a perceptron

A perceptron takes n numeric inputs (x₁, …, xₙ), multiplies each by a learned weight wᵢ, sums the products with a bias term b, and applies a step function:

def forward(self, x):
    z = np.dot(self.weights, x) + self.bias
    return 1 if z >= 0 else 0

That is the entire forward pass. The weighted sum is a single dot product; the step function is one if statement. The “neural” feel comes from analogy with McCulloch and Pitts’s 1943 biological neuron model: dendrites collect weighted inputs, the cell body integrates, the axon fires (1) when the total exceeds a threshold [3].

A diagram showing inputs x1 and x2 entering weighted connections w1 and w2 into a summation node, plus a bias term, then through a step activation function to produce binary output y. — Fig. 1 The perceptron: two inputs, two weights, a bias, a summation, and a step activation. Everything in modern deep learning is a generalisation of this single node.

Part 2 — The forward pass on real data

A trained perceptron’s decision rule is simple geometry: draw a line through the feature space defined by w₁ x₁ + w₂ x₂ + b = 0. Points on one side score positive (class 1); points on the other side score negative (class 0).

An animation showing input values 0.7 and 0.3 flowing through a perceptron: multiplied by weights, summed with bias, passed through a step function, and producing output 1. — Fig. 2 One forward pass with concrete numbers. The whole computation is one dot product plus a comparison.

For a separable two-class problem (the classic “linearly separable” setup), there is at least one line that perfectly splits the classes. The perceptron learning rule promises to find it.

A scatter plot of two-dimensional points: blue dots on one side and orange dots on the other, with a straight line cleanly separating them. — Fig. 3 A linearly separable dataset. The perceptron will converge to *some* dividing line; not necessarily the unique max-margin one (that's SVMs).

Part 3 — The learning rule

Rosenblatt’s 1958 weight-update rule is one line:

def update(self, x, y_true, lr=0.1):
    y_pred = self.forward(x)
    error = y_true - y_pred
    self.weights += lr * error * x
    self.bias    += lr * error

If the prediction is correct, error = 0 and nothing changes.
If the perceptron missed, error is ±1 and weights are pushed in the direction that would have produced the correct sign.
The learning rate lr scales the size of each update.

Novikoff’s theorem (1962) proves this rule converges to a separating hyperplane in finite steps if one exists. The number of updates is bounded by (R/γ)² where R is the data radius and γ the margin [4].

A scatter plot showing two classes of points coloured blue and orange, separated by the learned linear decision boundary. — Fig. 4 After training, the perceptron has rotated its decision line to cleanly separate the two classes. Same forward pass; weights now informed by data.

Synthesis project

EXECUTE I.

Train a perceptron on linearly separable data

Run simple_perceptron.py from the downloads. It generates a 2D dataset (two Gaussian blobs), trains a perceptron for 50 epochs, and plots the decision boundary.

Reflection questions

Why does the line stop moving after some number of epochs?
What does the learning rate control, and what goes wrong if it’s too large or too small?
The trained line is one valid separator; there are infinitely many others. Which one does the perceptron find?

MODIFY II.

Three perceptron experiments

Edit simple_perceptron.py to:

Goals

Smaller learning rate — drop lr from 0.1 to 0.01. Track epochs to convergence.
Different starting weights — initialise with weights = np.zeros(2) vs weights = np.random.randn(2). Compare the final lines.
Non-separable data — generate two overlapping clusters; observe that the perceptron never converges (the proof’s premise breaks).

CREATE III.

The XOR problem — and why it fails

Build a perceptron and train it on the XOR dataset: (0,0)→0, (0,1)→1, (1,0)→1, (1,1)→0. Plot the decision boundary after 1000 epochs.

python · exercise3_starter.py

import numpy as np
import matplotlib.pyplot as plt

class Perceptron:
    def __init__(self, n_in):
        self.w = np.zeros(n_in)
        self.b = 0.0

    def forward(self, x):
        return 1 if np.dot(self.w, x) + self.b >= 0 else 0

    def update(self, x, y, lr=0.1):
        pred = self.forward(x)
        err = y - pred
        self.w += lr * err * x
        self.b += lr * err

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=float)
y = np.array([0, 1, 1, 0])

p = Perceptron(2)
# TODO: train for 1000 epochs, shuffling each epoch.
# TODO: plot data points and the (failed) decision line.

Downloads

simple_perceptron.py — train + plot the boundary perceptron_solution.py — reference implementation

References

[1] Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. doi:10.1037/h0042519
[2] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. doi:10.1038/nature14539
[3] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.
[4] Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12, 615–622.
[5] Minsky, M., & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.
[6] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. deeplearningbook.org