Neural Networks: Perceptron to MLP
The Building Block of Deep Learning
Every deep learning model — from CNNs to Transformers — is built from neural networks. Understanding the math behind them is essential.
Deep Learning Architecture Family
1. The Perceptron (Single Neuron)
Mathematical Model:
Where:
- = input vector
- = weight vector (learned parameters)
- = bias term
- = activation function
2. Activation Functions
Why Activation Functions?
Without activation, a neural network is just linear regression:
Activation functions introduce non-linearity, allowing the network to learn complex patterns.
Sigmoid, Tanh, ReLU Comparison
Activation Function Formulas
Sigmoid
Output: (0, 1) — good for probabilities
Tanh
Output: (-1, 1) — zero-centered
ReLU
Output: [0, ∞) — most popular in hidden layers
Leaky ReLU and GELU (Modern Alternatives)
3. Multi-Layer Perceptron (MLP)
4. Forward Propagation (Mathematical)
Layer-by-Layer Computation:
Layer 1:
Layer 2:
General Layer :
Output:
5. Backpropagation (The Chain Rule)
Loss Function (Binary Cross-Entropy):
Gradient for Output Layer:
Gradient for Hidden Layer :
Weight Update:
6. Loss Functions
Regression Loss (MSE):
Classification Loss (Cross-Entropy):
Binary:
Multi-class:
7. Weight Initialization
Why Not Initialize All Weights to Zero?
If , all neurons compute the same function → no symmetry breaking → network can't learn.
Xavier/Glorot Initialization (Sigmoid/Tanh):
He Initialization (ReLU):
8. Optimizers
Adam Optimizer (Most Popular)
Adam Update Rules:
Defaults: , , ,
Key Takeaways
- Perceptron = weighted sum + activation — the basic unit
- MLP = multiple perceptrons in layers — universal approximator
- Activation functions introduce non-linearity — ReLU is default
- Backpropagation = chain rule applied recursively — computes gradients
- Adam optimizer is the default choice for most problems
- Weight initialization matters — use He init for ReLU
Next: PyTorch Fundamentals
Implement neural networks in PyTorch with autograd, tensors, and GPU support.