Graph Neural Networks — Deep Dive

Specialized ArchitecturesGNNsFree Lesson

Advertisement

Graph Neural Networks — Deep Dive

GNNs learn on graph-structured data by aggregating information from node neighborhoods. They generalize neural networks to irregular, non-Euclidean data structures.


Graph Data

DfGraph

A graph G=(V,E)G = (V, E) consists of:

  • Nodes VV: {v1,,vN}\{v_1, \ldots, v_N\} — entities (people, molecules, papers)
  • Edges EE: {(vi,vj)}\{(v_i, v_j)\} — relationships (friendships, bonds, citations)
  • Node features XRN×dX \in \mathbb{R}^{N \times d}: Feature matrix
  • Adjacency matrix A{0,1}N×NA \in \{0, 1\}^{N \times N}: Connectivity

Common graph types: citation networks, molecular graphs, social networks, knowledge graphs, point clouds.


Message Passing Framework

hv(k+1)=UPDATE(hv(k),AGGREGATE({muv:uN(v)}))h_v^{(k+1)} = \text{UPDATE}\left(h_v^{(k)}, \text{AGGREGATE}\left(\{m_{u \to v} : u \in \mathcal{N}(v)\}\right)\right)

Message Function

muv(k)=MSG(k)(hu(k),hv(k),euv)m_{u \to v}^{(k)} = \text{MSG}^{(k)}(h_u^{(k)}, h_v^{(k)}, e_{uv})

Here,

  • hu(k)h_u^{(k)}=Node u's representation at layer k
  • hv(k)h_v^{(k)}=Node v's representation at layer k
  • euve_{uv}=Edge features (if any)
  • N(v)\mathcal{N}(v)=Neighbors of node v

Aggregation Function

hˉv(k)=AGG(k)({muv(k):uN(v)})\bar{h}_v^{(k)} = \text{AGG}^{(k)}(\{m_{u \to v}^{(k)} : u \in \mathcal{N}(v)\})

Here,

  • AGG\text{AGG}=Aggregation function (mean, max, sum, attention)
  • hˉv(k)\bar{h}_v^{(k)}=Aggregated neighborhood message

Update Function

hv(k+1)=UPDATE(k)(hv(k),hˉv(k))h_v^{(k+1)} = \text{UPDATE}^{(k)}(h_v^{(k)}, \bar{h}_v^{(k)})

Here,

  • UPDATE\text{UPDATE}=Update function (e.g., GRU, MLP, residual connection)

Graph Convolutional Network (GCN)

DfGCN Layer

GCN (Kipf & Welling, 2017) performs spectral-inspired convolution on graphs:

hv(k+1)=σ(uN(v){v}1d^ud^vW(k)hu(k))h_v^{(k+1)} = \sigma\left(\sum_{u \in \mathcal{N}(v) \cup \{v\}} \frac{1}{\sqrt{\hat{d}_u \hat{d}_v}} W^{(k)} h_u^{(k)}\right)

where d^v=N(v)+1\hat{d}_v = |\mathcal{N}(v)| + 1 is the degree plus self-loop.

GCN Layer (Matrix Form)

H(k+1)=σ(D~1/2A~D~1/2H(k)W(k))H^{(k+1)} = \sigma\left(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(k)} W^{(k)}\right)

Here,

  • A~\tilde{A}=A + I (adjacency with self-loops)
  • D~\tilde{D}=Degree matrix of \tilde{A}
  • W(k)W^{(k)}=Learnable weight matrix at layer k
  • H(k)H^{(k)}=Node representations at layer k

GraphSAGE

DfGraphSAGE

GraphSAGE (Hamilton et al., 2017) enables inductive learning by sampling and aggregating from neighborhoods:

  1. Sample KK neighbors from N(v)\mathcal{N}(v)
  2. Aggregate using mean, LSTM, or pooling
  3. Concatenate with node's own features
  4. Transform with MLP

Key advantage: Can generate embeddings for unseen nodes (inductive).

GraphSAGE Update

hv(k+1)=σ(W(k)CONCAT(hv(k),AGG({hu(k):uSN(v)})))h_v^{(k+1)} = \sigma\left(W^{(k)} \cdot \text{CONCAT}(h_v^{(k)}, \text{AGG}(\{h_u^{(k)} : u \in S_{\mathcal{N}(v)}\}))\right)

Here,

  • SN(v)S_{\mathcal{N}(v)}=Sampled subset of neighbors
  • AGG\text{AGG}=Aggregator (mean, LSTM, pooling)
  • CONCAT\text{CONCAT}=Concatenation of self and neighborhood features

Graph Attention Network (GAT)

DfGAT

GAT (Veličković et al., 2018) uses attention to weight neighbor contributions:

αuv=exp(LeakyReLU(aT[WhuWhv]))kN(v)exp(LeakyReLU(aT[WhkWhv]))\alpha_{uv} = \frac{\exp(\text{LeakyReLU}(a^T [W h_u \| W h_v]))}{\sum_{k \in \mathcal{N}(v)} \exp(\text{LeakyReLU}(a^T [W h_k \| W h_v]))}
hv(k+1)=σ(uN(v)αuvWhu(k))h_v^{(k+1)} = \sigma\left(\sum_{u \in \mathcal{N}(v)} \alpha_{uv} W h_u^{(k)}\right)

GAT Attention Coefficients

αuv=exp(LeakyReLU(aT[WhuWhv]))kN(v)exp(LeakyReLU(aT[WhkWhv]))\alpha_{uv} = \frac{\exp(\text{LeakyReLU}(a^T [W h_u \| W h_v]))}{\sum_{k \in \mathcal{N}(v)} \exp(\text{LeakyReLU}(a^T [W h_k \| W h_v]))}

Here,

  • αuv\alpha_{uv}=Attention weight from u to v
  • aa=Attention parameter vector
  • WW=Shared linear transformation
  • \|=Concatenation

Over-Smoothing

ThOver-Smoothing in GNNs

As the number of GNN layers increases, node representations converge to the same vector regardless of initial features:

limktoinftyH(k)=1vT\lim_{k \\to \\infty} H^{(k)} = \mathbf{1} v^T

for some vector vv. This means deep GNNs lose the ability to distinguish nodes, limiting practical depth to 2-3 layers.

Advertisement

Need Expert Deep Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement