What Are Graph Neural Networks?
Why Graphs Need Special Architecture
Many important datasets are naturally graphs. A molecule is a graph where atoms are nodes and chemical bonds are edges. A social network is a graph where people are nodes and friendships are edges. A citation network is a graph where papers are nodes and citations are edges. A transportation system is a graph where locations are nodes and routes are edges.
Standard neural networks cannot process graphs directly. CNNs expect data on a regular grid (images have pixels arranged in rows and columns). RNNs and transformers expect sequences (ordered lists of tokens). Graphs have irregular structure: each node can have a different number of neighbors, there is no inherent ordering, and the connectivity pattern varies across the dataset. GNNs are designed to handle this irregularity.
Message Passing: The Core Mechanism
The fundamental operation in a GNN is message passing. In each layer, every node collects messages from its neighbors, aggregates them, and uses the aggregate to update its own representation.
A single message-passing step works as follows. For each node v, gather the feature vectors of all neighboring nodes. Apply a learned transformation to each neighbor's features (the "message" function). Aggregate all messages using a permutation-invariant function (sum, mean, or max). Combine the aggregated message with the node's own features through another learned transformation (the "update" function). The result is a new feature vector for node v that incorporates information from its local neighborhood.
Stacking multiple message-passing layers extends each node's receptive field. After one layer, a node knows about its immediate neighbors. After two layers, it knows about neighbors of neighbors. After k layers, each node has information from all nodes within k hops. This is analogous to how stacking convolutional layers in a CNN expands the receptive field from local patches to larger image regions.
Major GNN Architectures
Graph Convolutional Networks (GCNs), introduced by Kipf and Welling in 2017, are the simplest and most widely used GNN. Each layer computes: H_new = activation(A_norm * H * W), where A_norm is the normalized adjacency matrix (encoding which nodes connect to which), H is the current node feature matrix, and W is a learned weight matrix. GCNs treat all neighbors equally, weighting each neighbor's contribution by the inverse of the square root of both nodes' degrees.
Graph Attention Networks (GATs) add attention to the message-passing process. Instead of weighting all neighbors equally, GATs learn attention weights that determine how much each neighbor contributes to the node's update. This lets the model learn that some relationships are more informative than others, similar to how transformer attention learns which positions are most relevant. GATs typically outperform GCNs on heterogeneous graphs where edge importance varies.
Message Passing Neural Networks (MPNNs) are a general framework that encompasses most GNN variants. The MPNN framework defines a message function (how to compute messages from neighbors), an aggregation function (how to combine messages), and an update function (how to update node features). Different choices for these three functions produce different GNN architectures.
GraphSAGE (Graph Sample and Aggregate) addresses scalability by sampling a fixed number of neighbors for each node rather than using all neighbors. This makes the computational cost predictable and manageable for very large graphs (millions of nodes) where full-neighborhood aggregation is too expensive.
Applications in Chemistry and Drug Discovery
Chemistry is the field where GNNs have had the greatest practical impact, because molecules are inherently graphs. Atoms are nodes with features (atomic number, charge, hybridization), and bonds are edges with features (bond type, aromaticity).
GNNs can predict molecular properties (toxicity, solubility, binding affinity) directly from the molecular graph, replacing expensive laboratory experiments or quantum chemistry simulations with fast neural network inference. SchNet and DimeNet predict molecular energies with accuracy approaching density functional theory (DFT) calculations at a fraction of the computational cost.
AlphaFold, which won the 2024 Nobel Prize in Chemistry for protein structure prediction, uses a form of graph neural network where amino acid residues are nodes and their spatial and sequence relationships are edges. The model's Evoformer module combines attention and message passing to predict how proteins fold into their three-dimensional structures.
In drug discovery, GNNs screen candidate molecules for desired properties, predict drug-target interactions, and generate novel molecular structures with specific characteristics. This accelerates the drug development pipeline by identifying promising candidates before expensive synthesis and testing.
Other Applications
Social networks. Predicting user behavior, detecting communities, identifying influential nodes, and recommending connections. Pinterest uses GNNs (PinSage) to recommend content based on the graph of user interactions with pins.
Recommendation systems. Modeling the interactions between users and items as a bipartite graph. GNNs capture collaborative filtering signals (users who liked the same items have similar preferences) more effectively than traditional matrix factorization methods.
Traffic and transportation. Predicting traffic flow, optimizing routing, and modeling urban mobility patterns. Road networks are natural graphs, and GNNs can incorporate both spatial connectivity and temporal dynamics.
Physics simulation. Modeling particle interactions, fluid dynamics, and multi-body systems. Each particle or element is a node, and interactions between nearby elements are edges. GNNs can learn to simulate physical systems orders of magnitude faster than traditional numerical solvers.
Limitations and Challenges
Over-smoothing. As GNN depth increases, node representations converge to become nearly identical, losing the local information that distinguishes individual nodes. After too many message-passing layers, every node has the same representation because information has diffused uniformly across the graph. This limits practical GNN depth to 2 to 6 layers for most tasks, far shallower than CNNs or transformers.
Scalability. Full-batch training on large graphs requires storing the entire graph and all node features in memory. For graphs with millions of nodes and billions of edges, this exceeds GPU memory. Mini-batch training on graphs is more complex than on independent examples because nodes share neighbors, creating computational dependencies. Sampling-based methods (GraphSAGE, Cluster-GCN) address this but introduce approximation errors.
Expressiveness. Standard message-passing GNNs cannot distinguish certain graph structures that are theoretically distinguishable (formally, they are bounded by the Weisfeiler-Leman graph isomorphism test). Higher-order GNNs that consider subgraph patterns rather than individual nodes can exceed this limit but at significantly higher computational cost.
Graph neural networks process graph-structured data by passing messages between connected nodes, building representations that capture both node features and graph topology. They have become essential in chemistry and drug discovery (molecular property prediction, protein structure), social network analysis, and recommendation systems. Key challenges include over-smoothing in deep GNNs, scalability to large graphs, and limited expressiveness of the standard message-passing framework.