Neural Networks Fundamentals — Perceptrons to Deep Learning

Deep LearningNeural NetworksFree Lesson

Advertisement

Neural Networks Fundamentals

Neural networks learn complex patterns by stacking simple computational units (neurons) in layers.


The Perceptron

Single Neuron:

Inputs: x₁, x₂, ..., xₙ
Weights: w₁, w₂, ..., wₙ
Bias: b

Output = activation(Σ(wᵢxᵢ) + b)

This is just LINEAR REGRESSION + activation function!

Activation Functions

Sigmoid: σ(x) = 1/(1+e⁻ˣ)
├─ Output: (0, 1)
├─ Good for: Binary classification output
└─ Problem: Vanishing gradients

Tanh: tanh(x) = (eˣ - e⁻ˣ)/(eˣ + e⁻ˣ)
├─ Output: (-1, 1)
├─ Better than sigmoid (centered at 0)
└─ Still vanishing gradients

ReLU: f(x) = max(0, x)
├─ Output: [0, ∞)
├─ Fast computation
├─ Default choice for hidden layers
└─ Problem: Dead neurons (output always 0)

Leaky ReLU: f(x) = max(0.01x, x)
├─ Fixes dead neuron problem
└─ Slight negative slope for x < 0

GELU: f(x) = x × Φ(x)
├─ Used in Transformers
└─ Smooth approximation of ReLU

Multi-Layer Network

Input Layer    Hidden Layers    Output Layer
(x₁)          (h₁¹)            (h₁²)          (y₁)
(x₂) →        (h₂¹) →          (h₂²) →        (y₂)
(x₃)          (h₃¹)            (h₃²)

Each layer:
h = activation(Wx + b)

Width: Number of neurons per layer
Depth: Number of hidden layers

Universal Approximation Theorem:
A network with 1 hidden layer can approximate ANY function
(may need exponentially many neurons)

Backpropagation

How neural networks learn:

1. Forward Pass
   Input → Compute outputs layer by layer

2. Compute Loss
   Loss = L(predicted, actual)

3. Backward Pass (Backprop)
   Compute gradient of loss w.r.t. EACH weight
   Using chain rule of calculus:
   ∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

4. Update Weights
   w = w - α × ∂L/∂w
   (Gradient Descent)

PyTorch Implementation

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

# Initialize
model = NeuralNet()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(100):
    # Forward pass
    y_pred = model(X_train)
    loss = criterion(y_pred, y_train)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss={loss.item():.4f}")

Key Takeaways

  1. Neural networks are universal function approximators
  2. ReLU is the default activation for hidden layers
  3. Backpropagation computes gradients efficiently
  4. Learning rate is the most important hyperparameter
  5. Deeper networks can learn more complex patterns
  6. Overfitting is controlled by dropout, regularization, early stopping
  7. GPU acceleration is essential for training large networks
  8. PyTorch and TensorFlow are the main frameworks

Advertisement

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement