What is Deep Learning?
Your phone can unlock with your face. Google Translate understands sentences. Spotify knows what song you want to hear next. All of these features were built using the same idea: deep learning.
What is Deep Learning?
Deep learning is a type of machine learning that uses neural networks with many layers, not just one or two. The word “deep” refers to the depth of layers, not any philosophical meaning.
Each layer transforms the data it receives and passes the result to the next layer. By the time data has passed through many layers, the network has built up a rich, multi-level understanding of the input.
New word: representation is just the way data is described at a given layer. Raw pixels are one representation. A list of detected edges is another. A description of shapes is another. Each layer converts one representation into a better one.
A simple way to think about it
Imagine trying to identify a face in a crowd. You do not process the whole photo at once.
Your brain first notices light and dark patches (very basic). Then edges and lines. Then features like eyes and a nose. Then how those features fit together into a face. Then who the face belongs to.
A deep neural network does exactly the same thing, layer by layer. You never told it to look for edges. It learned that looking for edges is useful, because edges help identify shapes, which help identify faces. This automatic discovery of useful steps is called hierarchical representation learning, and it is the core insight behind deep learning.
No human writes the rules. The network invents them by learning from millions of examples.
How it works, step by step
- Raw data enters the first layer: pixels, sound waves, words, or numbers.
- The first layer finds very simple patterns: edges in an image, sounds in audio, word patterns in text.
- Each deeper layer combines the patterns from the layer below into more complex patterns.
- The final layers make the actual decision: “cat”, “spam”, “buy”, “translate to French”.
- The network is trained by showing it millions of labelled examples and adjusting every connection each time it makes a mistake.
See it visually
graph TD
Raw["Raw Pixels"] --> L1["Layer 1\nEdges & Colours"]
L1 --> L2["Layer 2\nTextures & Shapes"]
L2 --> L3["Layer 3\nObject Parts"]
L3 --> L4["Layer 4\nFull Objects"]
L4 --> Out["Classification: 🐱 Cat"]
Each box in the diagram is a layer. Data flows from top to bottom, becoming more abstract and meaningful at each step. The bottom layers see pixels. The top layers see concepts.
The maths (do not panic)
Here is what the full chain of layers looks like as a formula:
\[\mathbf{a}^{(L)} = f_L(\mathbf{W}^{(L)} f_{L-1}(\cdots f_1(\mathbf{W}^{(1)}\mathbf{x} + \mathbf{b}^{(1)}) \cdots) + \mathbf{b}^{(L)})\]In plain English: The output of the whole network is just a long chain of “multiply by weights, add a bias, apply an activation function,” repeated once for every layer. It looks complicated, but each individual step is simple. The depth of the chain, how many times it repeats, is what gives the network its power.
Show more detail
One challenge that arises with very deep networks is called the vanishing gradient problem. When the network is adjusting its weights after making a mistake, it has to send an error signal backwards through every layer. If the activation function used to be sigmoid (which squashes all values to between 0 and 1), the error signal got a little smaller at every layer. By the time it reached the early layers, it was so tiny that the weights there barely changed. The modern solution is to use ReLU as the activation function. ReLU simply passes positive values through unchanged, so the error signal stays full size as it travels back. Techniques like batch normalisation (which keeps values in a useful range at each layer) and residual connections (which add a shortcut that lets the error signal skip layers entirely) allow networks to be trained even when they are hundreds of layers deep.Run the code yourself
This code builds and trains a small deep network using PyTorch, a popular deep learning library. It learns to classify handwritten digit images. You will need PyTorch installed: in Colab, run !pip install torch torchvision in a cell first.
Step 1: Open Google Colab and create a new notebook.
Step 2: Copy this code into a cell:
import torch # PyTorch: the deep learning library
import torch.nn as nn # tools for building neural networks
import torch.optim as optim # tools for updating weights
from torchvision import datasets, transforms # image datasets and rescaling tools
from torch.utils.data import DataLoader # groups examples into batches
# Define a small deep network for MNIST (28x28 pixel images of digits 0-9)
class DeepNet(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Flatten(), # convert 28x28 image into 784 numbers
nn.Linear(784, 256), nn.ReLU(), nn.BatchNorm1d(256), # layer 1: 784 inputs, 256 outputs
nn.Linear(256, 128), nn.ReLU(), nn.BatchNorm1d(128), # layer 2: 256 inputs, 128 outputs
nn.Linear(128, 64), nn.ReLU(), # layer 3: 128 inputs, 64 outputs
nn.Linear( 64, 10) # output layer: 64 inputs, 10 class scores
)
def forward(self, x):
return self.net(x) # pass input through all layers in sequence
# Load the MNIST dataset and rescale pixel values to a small range
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
train_loader = DataLoader(datasets.MNIST('.', train=True, download=True, transform=transform), batch_size=256, shuffle=True)
test_loader = DataLoader(datasets.MNIST('.', train=False, download=True, transform=transform), batch_size=1000)
model = DeepNet() # create the network
optimizer = optim.Adam(model.parameters(), lr=1e-3) # Adam: a smart way to adjust the learning speed
criterion = nn.CrossEntropyLoss() # measures how wrong the predictions are
# Train for 3 passes through the dataset (called epochs)
for epoch in range(1, 4):
model.train()
for X, y in train_loader:
optimizer.zero_grad() # clear old error signals
loss = criterion(model(X), y) # forward pass: how wrong are we?
loss.backward() # backward pass: figure out who is to blame
optimizer.step() # nudge every weight in the right direction
# Test on images the network has never seen
model.eval()
correct = 0
with torch.no_grad(): # no need to track errors during testing
for X, y in test_loader:
correct += (model(X).argmax(1) == y).sum().item()
print(f"Epoch {epoch} Test accuracy: {correct / 100:.2f}%")
Step 3: Press Shift + Enter to run it.
You should see:
Epoch 1 Test accuracy: 97.52%
Epoch 2 Test accuracy: 97.89%
Epoch 3 Test accuracy: 98.21%
What each line does:
nn.Sequential(...): chains all the layers together so input flows through them in ordernn.Flatten(): converts the 28x28 pixel grid into a flat list of 784 numbersnn.BatchNorm1d(256): keeps values at a healthy scale between layers so training stays stableoptimizer.zero_grad(): clears the error signals from the last batch so they do not pile uploss.backward(): calculates how much each weight contributed to the erroroptimizer.step(): nudges every weight by a small amount to reduce the error
What just happened?
The network improved with every pass through the training data. It started at 97.5% accuracy and reached 98.2% by the third pass. Each pass it saw every training image once and adjusted its weights a little. Three passes through 60,000 images is 180,000 small adjustments. That is deep learning in action.
Quick recap
- Deep learning uses neural networks with many layers, each one learning more complex patterns than the one before.
- Nobody writes the rules. The network discovers what patterns are useful by learning from millions of labelled examples.
- The word “deep” just means many layers, typically more than two or three.
- Deep learning works best when the input is complex and unstructured: images, audio, or text. For spreadsheet-style data, simpler methods often work just as well.
- Transfer learning means you can often start from a network that has already learned general patterns on a large dataset, then fine-tune it for your specific task.
← Multi-Layer Perceptron Next → Convolutional Neural Networks