2025-03-05 graphs lecture 12

[[lecture-data]]
Summary

1. Graph Signals and Graph Signal Processing

Pytorch Things

First: how dataloaders work

Sampling-based GNN training

(1) divide V random batches Vb1,Vb2,

for jBi (each element in batch)
for =1L (forward pass)

(x)j=σ(k=0K1pN(j)[S]jp(x1)pHk)

note

Hk(i)=Hk(i1)ηjVBi,jJ(yj,(xi)j)

Where JV is the training set. If JV, then the task is semi-supervised and we create a mask over our nodes during training (see Lecture 6 for defining/creating mask)

jVBi
Loss and weights updates are only computed for the nodes in the batch

Last time intro to PyTorch - see Lecture 11 colab notebook

Stability of GNNs to graph perturbations

GNN outputs should be stable. Graphs might have small perturbations, but the outputs should vary as little as possible.

Example

offline controller should still perform well in online trajectories
20250310-2025-03-05-graph.png

Example

Recall that permutation equivariance functions as implicit data augmentation, ie, there are symmetries/structures automatically encoded into the data that we can utilize to have better models.
20250310-2025-03-05-graph-1.png
If we observe Node 1's label y1 at training time, but not Node 11's, the GNN still learns to output y~11. The GNN learns the same output for Node 11 as Node 1. Why? The network learns an automorphism of the graph.

20250310-2025-03-05-graph-1.png|200 maps to 20250310-2025-03-05-graph-2.png|200

In the example above, the automorphism is

Py=H(PSPT)Px=HSx=y

And this means that we don't need to know y11 during training time, we only need y1!

graph automorphism

Let G=(V,E) be a graph. A graph automorphism is a graph homomorphism from G to G. That is, a map γ:GG such that

(i,j)E(γ(i),γ(j))E

If there exists an automorphism for G, then there exists a corresponding permutation matrix P such that V=PV.

Note

In a learning task (in a GNN), this means that the shift operator S=PSPT and node features x=Px for the permutation matrix P.

see graph automorphism

We typically want our GNNs to be invariant to automorphisms

Example

In practice, most graphs don't have true automorphisms/symmetries, but quasi-symmetries.
20250310-2025-03-05-graph-3.png

Stability to perturbations ensures "data augmentation behavior" or permutation equivariance under not-quite-symmetries.

operator distance modulo permutations

The operator distance modulo permutations for an operator ψ:xy is given by

||ψψ||p=minPPmaxx:||x||=1||PTψ(x)ψ(PTx)||=supx:||x||=1||PTψ(x)ψ(PTx)||

This can be defined for any norm on the LHS, but typically is the operator norm induced by the 2 norm.

Note

For graphs, we can consider ψ to be some graph and x the nodes. This looks for the permutation that makes these graphs closest to one another, then computes a distance.

see operator distance modulo permutations

When ||SS||pϵ, then we have a quasi-symmetry. (see quasi-symmetry)

Stable Graph Filter

Let S be a graph shift operator and S=fϵ(S) some perturbation for some ϵ>0. Let H be a graph filter. We say H is stable to the perturbation f if

||SS||p0 as ϵ0

see stable graph filter

graph filters are equivariant to permutations

Recall the definition for a convolutional graph filter:

Linear Shift-Invariant /Convolutional Graph Filters

We define our convolutional graph filter (or linear shift-invariant) as follows. Let S be a graph shift operator and x some graph signal. Then a filter H has
y=H(S)x=k=0K1hkSkx,h0,h1,,hK1R
Note that means H(S) is a (K1)th degree polynomial of S with (filter) coefficients hi

invariant to permutations

Let S be a graph shift operator. A graph filter H(S) is permutation invariant if S is permutation invariant.

If S is permutation invariant, then since H is also permutation invariant, we have

||SS||p=0||H(S)H(S)||p=0

For GNNs, we have equivalently ||SS||p=0||Φ(S)Φ(S)||p=0 since this is a composed graph convolution

see filter permutation invariance

graph filter stability definitions

lipschitz stable / continuous

A function f:xy is cLipschtiz stable if

||f(x)f(x)||c||xx||x,x

or if ||f(x)||cx

(see Lipschitz continuous)

The convolutional graph filter y=k=0K1hkSkx is linear in both hk and x.

Because continuous linear functions are lipschitz continuous, graph convolutions are naturally lipschitz and stable to perturbations on both x and hk.

Takeaway

graph convolutions are Lipschitz stable to perturbations in x and hk.

This is because the convolutional graph filter y=k=0K1hkSkx is linear in both hk and x, and continuous linear functions are lipschitz continuous.

(see graph convolutions are stable to perturbations in the data and coefficients)

However, graph convolution is nonlinear in S... so it is more challenging to determine stability...

Suppose we have a graph shift operator S and some perturbed version S. To have stability, we want ||Φ(S)Φ(S)||pcϵ, ie we want this to be Lipschitz for a small constant c.

Recall the spectral representation of a convolutional graph filter:

Spectral Representation of a Convolutional Graph Filter

in the spectral/frequency domain, we have
y^=k=0K1hkΛkx^
this representation is completely defined by the polynomial h(λ)=k=0K1hkλk

lipschitz filter

Let h(λ) be the spectral representation of a convolutional graph filter. h(λ) is lipschitz on interval TR means there exists some cR such that

|h(λ)h(λ)|c|λλ|λ,λT

This uses the idea that once the filter is fixed, we can think of the representation as sampling along a "spectrum polynomial". Typically, T=[λmin,λmax]

see lipschitz graph filter

Example

20250310-2025-03-05-graph-4.png

Note

c can be as large or small as you want. A larger c means the filter is more discriminative, ie better able to tell the difference between different eigenvalues.

Discriminability

The discriminability of a graph filter describes the ability of the filter to tell the difference between different eigenvalues of the shift operator spectrum.

Recall that we can think of the spectrum as a polynomial in the eigenvalues of the shift operator.

The discriminability thus corresponds with ∣∣h(λ)h(λ+ϵ)∣∣ , or the change in spectral response for a small change in eigenvalue.

see discriminability of a graph filter

Stability-Discriminability tradeoff for Lipschitz filters

Note that the discriminability is the same at all frequencies. This means there is a tradeoff on benefits between having a large vs a small Lipschitz constant.

  • with a small c, there is better stability for small perturbations on the eigenvalues
  • However, if there are large perturbations on the graph, it is good to be more discriminating to notice these changes.
Example Illustration

20250310-2025-03-05-graph-5.png
Here, green has a smaller c and pink has a larger c.

λ2 is a perturbation on the second eigenvalue λ2.

  • The green filter is stable and gives a response that is very close to at both λ2 and λ2
  • whereas the pink filter has a larger difference at the two sampled locations, but has better discriminability

Having a larger Lipschitz constant results in better discrimination between the responses, but less stability.

see stability-discriminability tradeoff for Lipschitz filters

A relaxed version of a lipschitz graph filter is an integral Lipschitz filter.

integral lipschitz filter

Let h(λ) be the spectral representation of a convolutional graph filter. h(λ) is integral lipschitz on interval TR means there exists some cR such that

|h(λ)h(λ)|C|λλ|12|λ+λ|λ,λ

ie, h(λ) is lipschitz with a constant inversely proportional to the interval's midpoint

Letting λλ, we get λh(λ)c and h(x)cλ0 as λ

see integral Lipschitz filter

We consider 3 perturbation types (first for graph convolutions, then for GNNs)

  1. dilations
  2. additive perturbations
  3. relative perturbations
operator dilation

S=(1+ϵ)S||SS||=ϵ||S||

We can interpret this as the edges being scaled by (1+ϵ).

see operator dilation

notes

  • [p] This is a reasonable perturbation model because the edges change in proportion to their values.
  • [c] Cannot model edge additions or deletions

Theorem

Let S=(1+ϵ)S be a dilation, and consider graph convolution H(S). If H(S) is a integral Lipschitz filter with constant c, then

||H(S)H(S)||cϵ+O(ϵ2)

That is, integral Lipschitz filters are stable to dilations.