Neural Networks Fundamentals

Neural networks learn complex patterns by stacking simple computational units (neurons) in layers.

The Perceptron

Single Neuron:

Inputs: x₁, x₂, ..., xₙ
Weights: w₁, w₂, ..., wₙ
Bias: b

Output = activation(Σ(wᵢxᵢ) + b)

This is just LINEAR REGRESSION + activation function!

Activation Functions

Sigmoid: σ(x) = 1/(1+e⁻ˣ)
├─ Output: (0, 1)
├─ Good for: Binary classification output
└─ Problem: Vanishing gradients

Tanh: tanh(x) = (eˣ - e⁻ˣ)/(eˣ + e⁻ˣ)
├─ Output: (-1, 1)
├─ Better than sigmoid (centered at 0)
└─ Still vanishing gradients

ReLU: f(x) = max(0, x)
├─ Output: [0, ∞)
├─ Fast computation
├─ Default choice for hidden layers
└─ Problem: Dead neurons (output always 0)

Leaky ReLU: f(x) = max(0.01x, x)
├─ Fixes dead neuron problem
└─ Slight negative slope for x < 0

GELU: f(x) = x × Φ(x)
├─ Used in Transformers
└─ Smooth approximation of ReLU

Multi-Layer Network

Input Layer    Hidden Layers    Output Layer
(x₁)          (h₁¹)            (h₁²)          (y₁)
(x₂) →        (h₂¹) →          (h₂²) →        (y₂)
(x₃)          (h₃¹)            (h₃²)

Each layer:
h = activation(Wx + b)

Width: Number of neurons per layer
Depth: Number of hidden layers

Universal Approximation Theorem:
A network with 1 hidden layer can approximate ANY function
(may need exponentially many neurons)

Backpropagation

How neural networks learn:

1. Forward Pass
   Input → Compute outputs layer by layer

2. Compute Loss
   Loss = L(predicted, actual)

3. Backward Pass (Backprop)
   Compute gradient of loss w.r.t. EACH weight
   Using chain rule of calculus:
   ∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

4. Update Weights
   w = w - α × ∂L/∂w
   (Gradient Descent)

PyTorch Implementation

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Initialize
model = NeuralNet()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(100):
    # Forward pass
    y_pred = model(X_train)
    loss = criterion(y_pred, y_train)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss={loss.item():.4f}")

Key Takeaways

Neural networks are universal function approximators
ReLU is the default activation for hidden layers
Backpropagation computes gradients efficiently
Learning rate is the most important hyperparameter
Deeper networks can learn more complex patterns
Overfitting is controlled by dropout, regularization, early stopping
GPU acceleration is essential for training large networks
PyTorch and TensorFlow are the main frameworks

Neural Networks Fundamentals — Perceptrons to Deep Learning

Neural Networks Fundamentals

The Perceptron

Activation Functions

Multi-Layer Network

Backpropagation

PyTorch Implementation

Key Takeaways

Need Expert Machine Learning Help?