[[lecture-data]] Summary
- expressivity with respect to the WL test
Today
1. Graph Signals and Graph Signal Processing
Pytorch Things
First: how dataloaders work
Sampling-based GNN training
(1) divide random batches
- number of batches depends on
batch size, a hyperparameter you set (2) for n batches do
for (each element in batch) for (forward pass) note
- only embeddings of are updated (not all of them)
Where is the training set. If , then the task is semi-supervised and we create a mask over our nodes during training (see Lecture 6 for defining/creating mask)
Loss and weights updates are only computed for the nodes in the batch
Last time intro to PyTorch - see Lecture 11 colab notebook
- torch_geometric vs hand-coding: representing graphs as edge lists
- convolutions more like sparse matrix multiplications
- many benchmarks available including Cora, paper citation network dataset
NeighborLoaderis a dataloader for graph nodes- requires training mask, typically included in canonical datasets
Stability of GNNs to graph perturbations
GNN outputs should be stable. Graphs might have small perturbations, but the outputs should vary as little as possible.
Example
offline controller should still perform well in online trajectories
Example
Recall that permutation equivariance functions as implicit data augmentation, ie, there are symmetries/structures automatically encoded into the data that we can utilize to have better models.
If we observe
Node 1’s label at training time, but notNode 11’s, the GNN still learns to output . The GNN learns the same output forNode 11asNode 1. Why? The network learns an automorphism of the graph.
maps to
In the example above, the automorphism is And this means that we don’t need to know during training time, we only need !
graph automorphism
Let be a graph. A graph automorphism is a graph homomorphism from to . That is, a map such that
If there exists an automorphism for , then there exists a corresponding permutation matrix such that .
Note
In a learning task (in a GNN), this means that the shift operator and node features for the permutation matrix .
We typically want our GNNs to be invariant to automorphisms
Example
In practice, most graphs don’t have true automorphisms/symmetries, but quasi-symmetries.
Stability to perturbations ensures “data augmentation behavior” or permutation equivariance under not-quite-symmetries.
operator distance modulo permutations
The operator distance modulo permutations for an operator is given by This can be defined for any norm on the LHS, but typically is the operator norm induced by the 2 norm.
NOTE
For graphs, we can consider to be some graph and the nodes. This looks for the permutation that makes these graphs closest to one another, then computes a distance.
see operator distance modulo permutations
When , then we have a quasi-symmetry. (see quasi-symmetry)
Stable Graph Filter
Let be a graph shift operator and some perturbation for some . Let be a graph filter. We say is stable to the perturbation if
graph filters are equivariant to permutations
Recall the definition for a convolutional graph filter:
Linear Shift-Invariant /Convolutional Graph Filters
We define our convolutional graph filter (or linear shift-invariant) as follows. Let be a graph shift operator and some graph signal. Then a filter has y = H(S)x = \sum_{k=0}^{K-1}h_{k} S^k x, \;\;\; h_{0}, h_{1}, \dots, h_{K-1} \in \mathbb{R}$$ Note that means H(S)(K-1)Sh_{i}$
invariant to permutations
Let be a graph shift operator. A graph filter is permutation invariant if is permutation invariant.
If is permutation invariant, then since is also permutation invariant, we have For GNNs, we have equivalently since this is a composed graph convolution
see filter permutation invariance
graph filter stability definitions
lipschitz stable / continuous
A function is Lipschtiz stable if or if
(see Lipschitz continuous)
The convolutional graph filter is linear in both and .
- (1) if , then
- (2) if , then
Because continuous linear functions are lipschitz continuous, graph convolutions are naturally lipschitz and stable to perturbations on both and .
Takeaway
graph convolutions are Lipschitz stable to perturbations in and .
This is because the convolutional graph filter is linear in both and , and continuous linear functions are lipschitz continuous.
(see graph convolutions are stable to perturbations in the data and coefficients)
However, graph convolution is nonlinear in … so it is more challenging to determine stability…
Suppose we have a graph shift operator and some perturbed version . To have stability, we want , ie we want this to be Lipschitz for a small constant .
Recall the spectral representation of a convolutional graph filter:
Spectral Representation of a Convolutional Graph Filter
in the spectral/frequency domain, we have \hat{y} = \sum_{k=0}^{K-1}h_{k} \Lambda^k \hat{x}$$ this representation is *completely defined* by the [[Concept Wiki/the spectral graph filter operates on a signal pointwise\|polynomial]] h(\lambda) = \sum_{k=0}^{K-1}h_{k} \lambda^k$
lipschitz filter
Let be the spectral representation of a convolutional graph filter. is lipschitz on interval means there exists some such that This uses the idea that once the filter is fixed, we can think of the representation as sampling along a “spectrum polynomial”. Typically,
Example
NOTE
can be as large or small as you want. A larger means the filter is more discriminative, ie better able to tell the difference between different eigenvalues.
Discriminability
The discriminability of a graph filter describes the ability of the filter to tell the difference between different eigenvalues of the shift operator spectrum.
Recall that we can think of the spectrum as a polynomial in the eigenvalues of the shift operator.
The discriminability thus corresponds with , or the change in spectral response for a small change in eigenvalue.
see discriminability of a graph filter
Stability-Discriminability tradeoff for Lipschitz filters
Note that the discriminability is the same at all frequencies. This means there is a tradeoff on benefits between having a large vs a small Lipschitz constant.
- with a small , there is better stability for small perturbations on the eigenvalues
- However, if there are large perturbations on the graph, it is good to be more discriminating to notice these changes.
Example Illustration
Here, green has a smaller and pink has a larger .
is a perturbation on the second eigenvalue .
- The green filter is stable and gives a response that is very close to at both and
- whereas the pink filter has a larger difference at the two sampled locations, but has better discriminability
Having a larger Lipschitz constant results in better discrimination between the responses, but less stability.
see stability-discriminability tradeoff for Lipschitz filters
A relaxed version of a lipschitz graph filter is an integral Lipschitz filter.
integral lipschitz filter
Let be the spectral representation of a convolutional graph filter. is integral lipschitz on interval means there exists some such that ie, is lipschitz with a constant inversely proportional to the interval’s midpoint
Letting , we get and as
We consider 3 perturbation types (first for graph convolutions, then for GNNs)
operator dilation
We can interpret this as the edges being scaled by .
notes
- [p] This is a reasonable perturbation model because the edges change in proportion to their values.
- At the same time, it is unrealistic because the proportion is the same for all edges ✅ 2025-03-07
- [c] Cannot model edge additions or deletions
Theorem
Let be a dilation, and consider graph convolution . If is a integral Lipschitz filter with constant , then That is, integral Lipschitz filters are stable to dilations.

If we observe 


Here,