12.1.2 DCGAN Art · Pixels2GenAI

Overview

Deep Convolutional Generative Adversarial Networks (DCGANs) were introduced by Radford et al. in 2015. They apply convolutional neural network architectures to the GAN framework, producing images with coherent spatial structure where fully-connected GANs produced noise.

In this exercise you will generate abstract art with a pre-trained DCGAN, explore the learned latent space through interpolation, and (optionally) train the network from scratch on a real dataset of African fabric patterns. The narrative arc: in early modules you wrote algorithms to generate patterns by hand; here, you teach a neural network to learn and generate similar patterns autonomously.

Learning objectives

Understand why convolutional architectures improve image generation over fully-connected networks.
Analyse the Generator and Discriminator components of a DCGAN.
Generate novel abstract art by sampling from the learned latent space.
Explore latent-space interpolation to create smooth transitions between generated images.

Quick start — see it in action

Load the pre-trained generator and produce four random art pieces.

python · quick-start.py

import torch
from dcgan_model import Generator, LATENT_DIM
import matplotlib.pyplot as plt

# Load pre-trained generator
generator = Generator()
generator.load_state_dict(torch.load('exercise3_generator.pth', map_location='cpu'))
generator.eval()

# Generate 4 random art pieces
z = torch.randn(4, LATENT_DIM, 1, 1)
with torch.no_grad():
    images = generator(z)

# Display the generated art
images = (images + 1) / 2  # Convert from [-1, 1] to [0, 1]
fig, axes = plt.subplots(1, 4, figsize=(12, 3))
for i, ax in enumerate(axes):
    ax.imshow(images[i].permute(1, 2, 0).numpy())
    ax.axis('off')
plt.savefig('quick_start_output.png', dpi=150)

Four DCGAN-generated abstract art patterns — Fig. 2 Four abstract art patterns generated from random 100-dim latent vectors.

quick-start.py exercise3_generator.pth — pre-trained weights (14 MB)

The generator transforms 100-dimensional random noise vectors into 64×64 pixel RGB images. The patterns exhibit smooth gradients, geometric shapes, and colour harmonies learned from training data.

Core concepts

Concept 1 — From fully-connected to convolutional

Traditional GANs using fully-connected layers struggle to generate coherent images because they treat each pixel independently. Consider a 64×64 RGB image: that is 12,288 values the network must generate without any understanding of spatial relationships.

Convolutional Neural Networks (CNNs) solve this by exploiting the spatial structure of images. Three key properties make CNNs effective for image generation:

Local connectivity — each neuron connects to a small region (receptive field), learning local patterns.
Weight sharing — the same filter is applied across the entire image, learning translation-invariant features.
Hierarchical features — lower layers detect edges and textures; higher layers combine these into complex patterns.

DCGANs apply these principles to both the generator (which uses transposed convolutions to upsample) and the discriminator (which uses standard convolutions to downsample).

Concept 2 — DCGAN architecture

A DCGAN consists of two competing networks: a Generator that creates images from noise, and a Discriminator that distinguishes real images from generated ones.

python · dcgan_model.py — Generator

class Generator(nn.Module):
    def __init__(self, latent_dim=100, img_channels=3, feature_maps=64):
        super().__init__()
        self.network = nn.Sequential(
            # Input: 100 × 1 × 1 latent vector
            # Layer 1: 100 → 4 × 4 × 512
            nn.ConvTranspose2d(latent_dim, feature_maps * 8, 4, 1, 0),
            nn.BatchNorm2d(feature_maps * 8),
            nn.ReLU(True),

            # Layer 2: 4 × 4 × 512 → 8 × 8 × 256
            nn.ConvTranspose2d(feature_maps * 8, feature_maps * 4, 4, 2, 1),
            nn.BatchNorm2d(feature_maps * 4),
            nn.ReLU(True),

            # Layer 3: 8 × 8 × 256 → 16 × 16 × 128
            nn.ConvTranspose2d(feature_maps * 4, feature_maps * 2, 4, 2, 1),
            nn.BatchNorm2d(feature_maps * 2),
            nn.ReLU(True),

            # Layer 4: 16 × 16 × 128 → 32 × 32 × 64
            nn.ConvTranspose2d(feature_maps * 2, feature_maps, 4, 2, 1),
            nn.BatchNorm2d(feature_maps),
            nn.ReLU(True),

            # Layer 5: 32 × 32 × 64 → 64 × 64 × 3
            nn.ConvTranspose2d(feature_maps, img_channels, 4, 2, 1),
            nn.Tanh()  # Output in [-1, 1]
        )

The first layer expands the latent vector to a 4×4 spatial grid with 512 channels.
Each subsequent layer doubles the spatial resolution while halving the channels.
The final layer produces a 3-channel RGB image with Tanh activation.

The discriminator mirrors the generator, using strided convolutions to downsample from 64×64 to a single scalar.

Side-by-side diagram showing Generator upsampling and Discriminator downsampling — Fig. 3 DCGAN architecture. Generator (left) upsamples a latent vector to a 64×64 image; Discriminator (right) downsamples an image to a single real/fake scalar.

Architectural guidelines from the DCGAN paper:

Replace pooling with strided convolutions (discriminator) and transposed convolutions (generator).
Use Batch Normalisation in both networks (except discriminator input and generator output).
Use ReLU in the generator (except the output, which uses Tanh).
Use LeakyReLU in the discriminator to prevent sparse gradients.

Concept 3 — Latent space and art generation

The latent space is the 100-dimensional space from which the generator samples input vectors. Each point in this space corresponds to a unique generated image; nearby points produce visually similar images.

This property enables creative applications. Interpolation: by smoothly transitioning between two latent vectors, we morph one image into another.

python

def interpolate(z1, z2, steps=10):
    """Generate images along a path between two latent points."""
    images = []
    for t in range(steps):
        alpha = t / (steps - 1)
        z = (1 - alpha) * z1 + alpha * z2  # Linear interpolation
        z = z.view(1, 100, 1, 1)
        img = generator(z)
        images.append(img)
    return images

Eight images showing gradual transition from one pattern to another — Fig. 4 Linear interpolation path through latent space. The patterns smoothly transform — evidence that the generator has learned a structured representation.

Exercises

EXECUTE I.

Observe DCGAN generation

Run the pre-trained generator to see how DCGANs create abstract art from random noise vectors. The script produces a 4×4 grid of unique patterns.

exercise1_observe.py

bash

python exercise1_observe.py

4x4 grid of DCGAN-generated abstract art — Fig. 5 Generated abstract-art grid (4×4). Your output will differ slightly because of the random seed.

Reflection

How does the latent vector size (100 dimensions) affect the diversity of generated art?
What common visual patterns do you observe across samples?
Why does the generator use Tanh activation in the output layer?

MODIFY II.

Explore parameters and interpolation

Experiment with generation parameters: larger grids, more interpolation steps, and different random seeds.

exercise2_explore.py

The script demonstrates two explorations:

Larger grid — generates a 6×6 grid (36 samples) to see more variation.
Latent interpolation — creates smooth transitions between two random latent vectors.

6x6 grid of generated abstract art patterns — Fig. 6 6×6 grid showing greater variety in generated patterns.

Eight-step interpolation between two abstract patterns — Fig. 7 Latent-space interpolation: an 8-step smooth transition between two patterns.

Try these modifications

Change the random seed to generate different sample sets.
Modify grid size (try 8×8 or 10×10).
Increase interpolation steps to 16 or 32 for smoother transitions.
Compare linear interpolation at different points in latent space.

TRAIN III.

Train on African fabric patterns

Train a DCGAN from scratch on African fabric patterns from Kaggle. Demonstrates the complete training process from dataset preparation through 100 epochs of adversarial training.

Time commitment

Dataset setup: 5–10 min (one-time)
Training: 20–30 min on GPU, 60–90 min on CPU
Observation: 10 min

Training configuration & execution

Hyperparameters:

Epochs: 100
Batch size: 64
Learning rate: 0.0002 (both networks)
Optimiser: Adam, β₁=0.5, β₂=0.999
Loss: Binary Cross-Entropy (BCE)
Dataset: 1,059 fabric patterns (64×64 RGB)

exercise3_train.py

bash

python exercise3_train.py

The script will:

Load the dataset from african_fabric_processed/.
Initialise the Generator and Discriminator.
Train for 100 epochs with per-epoch progress updates.
Save checkpoints at epochs 10, 30, 50, 70, and 100.
Save the final trained model as exercise3_generator.pth.

What to observe during training

Loss oscillation — both G and D losses fluctuate significantly (this is normal).
Occasional spikes — sudden jumps are expected in GAN training.
No clear convergence — unlike supervised learning, GAN losses don’t smoothly decrease to zero.

Training results — 100 epochs of learning

Training loss curves over 100 epochs — Fig. 8 Generator (blue) vs Discriminator (red) losses over 100 epochs. The oscillating pattern is healthy adversarial competition.

Watch the generator’s visual progression across checkpoints:

Generated samples at epoch 10 — Fig. 9 Epoch 10 — noisy attempts with basic colours and rough shapes.

Generated samples at epoch 30 — Fig. 10 Epoch 30 — geometric elements emerge: stripes, bands, blocky shapes reminiscent of Kente cloth.

Generated samples at epoch 70 — Fig. 11 Epoch 70 — intricate designs with sharp colour boundaries and complex motif arrangements.

Generated samples at epoch 100 — Fig. 12 Epoch 100 — diverse, fabric-like patterns with rich geometric detail.

16-sample grid from the trained generator — Fig. 13 16 unique patterns from the trained generator — each from a different random noise vector.

Implementation note

Challenge extensions

Summary

Common pitfalls

Mode collapse — generator produces limited variety. Adjust learning rates or use minibatch discrimination.
Checkerboard artifacts — caused by transposed convolutions. Use resize-convolution instead.
Training instability — monitor both losses; if one dominates, adjust learning rates.
Wrong output range — forgetting Tanh or input normalisation causes divergence.
Dimension mismatches — latent vectors must have shape (batch, latent_dim, 1, 1), not (batch, latent_dim).

References

[1] Radford, A., Metz, L. & Chintala, S. (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv preprint. arxiv:1511.06434
[2] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. & Bengio, Y. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems, 27.
[3] Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning. MIT Press. ISBN 978-0-262-03561-3.
[4] LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
[5] Ioffe, S. & Szegedy, C. (2015). Batch Normalization. arXiv preprint. arxiv:1502.03167
[6] Kingma, D. P. & Welling, M. (2014). Auto-Encoding Variational Bayes. arXiv preprint. arxiv:1312.6114
[7] Odena, A., Dumoulin, V. & Olah, C. (2016). Deconvolution and Checkerboard Artifacts. Distill. distill.pub/2016/deconv-checkerboard/
[8] PyTorch Contributors. (2024). DCGAN Tutorial. pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html