Pixels2GenAI
Path ii Continuum
M 11 · 11.1.1 · hands-on

11.1.1 Webcam Processing

Turn a webcam into a 30 fps NumPy array stream. Capture → process → display loop, with frame differencing as the canonical 'detect motion' demo.

Duration25–30 min
Levelintermediate
Load
Prereqs3.4.1 (convolution), 8.1.1 (per-frame operators)

Overview

A webcam is a NumPy array generator. Every 30-millisecond tick it produces a new (H, W, 3) array, the same data structure you have manipulated since lesson 1.1.1. The job of an interactive system is to put a transformation between capture and display — anything from a colour filter to motion detection to a full pose-estimation network. The pipeline is identical regardless of complexity:

while True: frame = capture(); processed = transform(frame); display(processed)

This lesson covers the canonical loop with OpenCV’s VideoCapture, three classes of real-time operators (channel, blur, edge), and the most-used interactive-systems primitive: frame differencing for motion detection. The code is OpenCV-flavoured because that’s the dominant ecosystem; the operators are the NumPy ones you already know.

Learning objectives

  1. Open and read a webcam stream via OpenCV’s VideoCapture API, releasing the device cleanly.
  2. Apply real-time per-frame transformations (grayscale, blur, edge detection) without dropping below 30 fps.
  3. Implement frame-differencing motion detection: store previous frame, absdiff, threshold, overlay.
  4. Recognise that every interactive-system input — webcam, mic, MIDI, Kinect — produces the same pipeline shape.

Quick start — capture loop

python · quick_start.py
import cv2

cap = cv2.VideoCapture(0)   # 0 = default webcam; can also be a video filename

while True:
    ok, frame = cap.read()
    if not ok:
        break
    cv2.imshow('Webcam', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
A two-by-two grid demonstrating the motion-detection pipeline on synthetic frames: top-left and top-right show two consecutive frames with an orange disc in slightly different positions, bottom-left shows a binary motion mask highlighting where the disc moved, bottom-right shows the current frame with the moving region overlaid in green
Fig. 1 The four-stage motion detection pipeline. Top: two consecutive frames. Bottom-left: binary motion mask. Bottom-right: green overlay on the current frame where motion was detected.

Core concepts

Concept 1 — The capture-process-display loop

Every interactive video application has the same three-step inner loop:

  1. Capture. Read one frame as a NumPy array. ret, frame = cap.read() returns False on disconnect or end-of-file; always check ret before processing.
  2. Process. Apply zero or more transformations. The frame is uint8 BGR (OpenCV’s quirk — not RGB); convert to grayscale, blur, threshold, anything from your NumPy toolkit.
  3. Display. cv2.imshow(window, image) paints a window; cv2.waitKey(1) gives the GUI 1 ms to update and polls for a keypress. Without waitKey, no frame ever displays.

After the loop, cap.release() and cv2.destroyAllWindows() clean up. Leaving the device locked is a common bug — the camera is unusable until your Python process exits or the OS recovers.

Concept 2 — Real-time per-frame operators

Since each frame is a NumPy array, every Module 03 transformation works in real time. The OpenCV equivalents are usually one line:

python · operators.py
gray    = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)   # luma
blurred = cv2.GaussianBlur(frame, (21, 21), 0)      # 21x21 Gaussian
edges   = cv2.Canny(gray, 50, 150)                  # Canny edges
mirror  = cv2.flip(frame, 1)                        # horizontal flip

The constraint is frame budget: at 30 fps you have ~33 ms per frame for capture + process + display. A 640×480 Gaussian blur takes ~2 ms; a Canny edge detector takes ~5 ms; reading the frame and showing the result eats another ~10 ms. Most interactive applications fit a handful of operators per frame comfortably.

Concept 3 — Frame-differencing motion detection

The cheapest motion detector compares consecutive frames. Pixels that change significantly between $t-1$ and $t$ are “moving”:

python · motion.py
prev_gray = cv2.GaussianBlur(cv2.cvtColor(prev, cv2.COLOR_BGR2GRAY), (21, 21), 0)

while True:
    ok, cur = cap.read()
    cur_gray = cv2.GaussianBlur(cv2.cvtColor(cur, cv2.COLOR_BGR2GRAY), (21, 21), 0)

    diff = cv2.absdiff(prev_gray, cur_gray)         # per-pixel |a - b|
    _, motion = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)

    out = cur.copy()
    out[motion > 0] = [0, 255, 0]                   # green where motion

    prev_gray = cur_gray.copy()
    cv2.imshow('Motion', out)
    if cv2.waitKey(1) & 0xFF == ord('q'): break

Two pre-processing tricks are doing real work here. First, convert to grayscale before comparing — colour comparisons would triple the work and add no useful information for motion. Second, blur before differencing — a 21×21 Gaussian removes per-pixel sensor noise that would otherwise produce false-positive “motion.”

An animated demonstration of the motion-detection pipeline on a synthetic scene: an orange disc moves left-to-right across a textured wall, with a green overlay highlighting the disc as it moves
Fig. 2 40-frame synthetic webcam playback with motion overlay. The orange disc moves through the scene; the green region tracks it via frame differencing.

Exercises

Three exercises in Execute → Modify → Create order: run the capture loop, add a filter, then build a motion detector.

EXECUTE I.

Run the capture loop

Run webcam_capture.py. Press s to save a frame; q to quit. If you don’t have a webcam, run synthetic_webcam.py instead, which simulates the same pipeline on a synthetic scene.

webcam_capture.py — minimal OpenCV capture synthetic_webcam.py — no-camera demo

Reflection questions

  • What is the shape of frame? What dtype?
  • Why does the OpenCV-saved PNG sometimes look red-blue inverted in another viewer?
  • If you remove cv2.waitKey(1), what happens?
MODIFY II.

Add three filters

Modify webcam_effects.py (or the synthetic version) to apply three real-time effects.

Goals

  1. Sepia. Apply the standard sepia matrix per pixel.
  2. Pixelate. Resize down to 64×48 with nearest-neighbour, then back up.
  3. Mirror. Horizontal flip — selfie convention.
CREATE III.

Build a motion detector

Implement the frame-differencing motion detector from scratch using the starter template. The current and previous frames are converted to grayscale, blurred, differenced, and thresholded.

webcam_starter.py — frame differencing skeleton
python · exercise3_starter.py
import cv2
import numpy as np

cap = cv2.VideoCapture(0)
ok, prev = cap.read()
prev_gray = cv2.GaussianBlur(cv2.cvtColor(prev, cv2.COLOR_BGR2GRAY), (21, 21), 0)

while True:
    ok, cur = cap.read()
    cur_gray = cv2.GaussianBlur(cv2.cvtColor(cur, cv2.COLOR_BGR2GRAY), (21, 21), 0)

    # TODO 1: per-pixel absolute difference between current and previous gray frames

    # TODO 2: threshold the diff to a binary motion mask at value 25

    # TODO 3: copy the current frame, paint green wherever the mask is set

    prev_gray = cur_gray.copy()
    cv2.imshow('Motion', out)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Make it your own

  • Motion trail. Accumulate motion masks across the last 10 frames with trail = trail * 0.9 + motion * 0.1 and display trail. Motion leaves a fading ghost.
  • Audio trigger. Sum the motion mask each frame; if it exceeds a threshold, play a sound via pygame.mixer. The webcam becomes an “activity-aware” instrument.
  • Background subtraction. Replace consecutive-frame differencing with cv2.createBackgroundSubtractorMOG2(). Static cameras get vastly more robust motion segmentation.

Downloads

webcam_capture.py — minimal capture loop webcam_effects.py — real-time filters background_subtraction.py — motion detection synthetic_webcam.py — no-camera demo

Summary

Common pitfalls to avoid

  • Forgetting cap.release(). Locks the device until process exit.
  • Skipping cv2.waitKey. The window never updates; debugging is mysterious.
  • Ignoring ret. A False return on disconnect crashes the next line.
  • Comparing colour frames directly. 3× the work for no benefit; convert to grayscale first.

References

  1. [1] Bradski, G., & Kaehler, A. (2008). Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly. ISBN 978-0-596-51613-0.
  2. [2] Piccardi, M. (2004). Background subtraction techniques: A review. IEEE International Conference on Systems, Man and Cybernetics, 4, 3099–3104. doi:10.1109/ICSMC.2004.1400815
  3. [3] Szeliski, R. (2022). Computer Vision: Algorithms and Applications (2nd ed.). Springer. ISBN 978-3-030-34371-2.
  4. [4] OpenCV Development Team. (2024). VideoCapture Class Reference. docs.opencv.org
  5. [5] Gonzalez, R. C., & Woods, R. E. (2018). Digital Image Processing (4th ed.). Pearson. ISBN 978-0-13-335672-4.
  6. [6] Stauffer, C., & Grimson, W. E. L. (1999). Adaptive background mixture models for real-time tracking. IEEE CVPR, 2, 246–252. doi:10.1109/CVPR.1999.784637
  7. [7] Harris, C. R., Millman, K. J., van der Walt, S. J., et al. (2020). Array programming with NumPy. Nature, 585, 357–362. doi:10.1038/s41586-020-2649-2