2025-02-19 graphs lecture 9

[[lecture-data]]

2025-02-19

 

Summary

Last Time

Today

  • G(y)ATs
  • Expressivity in graph-level tasks
  • Graph isomorphism problem

1. Graph Signals and Graph Signal Processing

Graph Attention (GAT)

(x_{\ell})_{i} = \sigma\left( \sum_{j \in N(i)} \alpha_{ij}(x_{\ell-1})_{j} H_{\ell} \right)$$ Where the \alpha_{ij}$ are the ==learned== graph attention coefficients, computed as

\alpha_{ij} = \text{softmax}_{N(i)}(e_{ij}), \;\;\; e_{ij} = \frac{\rho(\vec{a}(x_{i}H' \lvert \rvert x_{j} H'))}{\sum_{j' \in N(i)} \rho(\vec{a} (x_{i}H' \lvert \rvert x_{j'}H' ))}$$ Here, we are learning the [[Concept Wiki/graph]] [[Concept Wiki/unweighted graph\|weight matrix]] EH’iji$.

  • is the row-wise concatenation operation.
    • so
    • and each
  • is a pointwise nonlinearity, typically leaky ReLU

The learnable parameters for this model are for .

Notes

  • This architecture is local because it is still respecting the graph sparsity pattern, and learning the “similarities” of the transformed features at each layer of the network.
  • This architecture does not depend on the size of the graph, only the dimension of the features
  • It can still be expensive to compute however, if the graph is dense. We need to compute coefficients , which can be up to in complete graphs.

This additional flexibility increases the capacity of the architecture (less likely to underfit to training data), and this has ben observed empirically. But this comes at the cost of a more expensive forward pass.

This architecture is available and implemented in PyTorch Geometric

Note

We can think of the GAT layer as a convolutional layer where the GSO , is learned

  • however, this architecture is not convolutional since is changing at each step.
  • But it can be useful to see how we can write/think about it as something that looks convolutional (if you squint)

Leaky ReLU

The leaky ReLU is a pointwise nonlinearity that is given as:

x,\;\;x \geq 0 \\ \alpha x, \;x < 0 \end{cases}$$ (see [[Concept Wiki/leaky ReLU]])

the “readout” in graph-level tasks

Idea

In graph-level problems, the output does not need to be a graph signal - it could be a scalar, a class label, or some vector in . That is, are all possible.

Question

How do we map GNN layers to such outputs?

Answer

We can use “readout” layers

Readout Layer

A readout layer is an additional layer added to a GNN to achieve the desired output type/dimension for graph-level tasks or other learning tasks on graphs that require an output that is not a graph signal.

(see readout layer)

Example

In node-level tasks (ex source localization, community detection, citation networks, etc), both the input and the output are graph signals . Thus, the map is composed strictly of GNN layers.

GNN layer equation

$X_{\ell} = \sigma(U_{\ell}) = \sigma\left( \sum_{k=0}^{K-1} S^k X_{\ell-1} H_{\ell, k} \right)$$

When we learn our GNN layers, we fix and learn only which does not depend on the graph size.

Option 1

fully connected readout layer

In a fully connected readout layer we define Where

    • in general vectorizes matrices into vectors
  • can be the identity or some other pointwise nonlinearity (ReLU, softmax, etc)

(see fully connected readout layer)

Note

There are some downsides of a fully connected readout layer

  • The number of parameters depends on - adding learning parameters, which grows with the graph size. This is not amenable to groups of large graphs

  • No longer permutation invariant because of the operation

fully connected readout layers are no longer permutation invariant

Verify that

  • No longer transferrable across graphs.
    • unlike in GNNs, depends on . So if the number of nodes changes, we have to relearn

These make this a not-so-attractive option, so we usually use an aggregation readout layer

Aggregation Readout Layer

The aggregation readout layer is a readout layer given by

where and is node-level aggregation Typical aggregation functions are the mean, sum, max, min, median, etc

NOTE

  • this is now independent of (my aggregation functions don’t change based on the size of my graph)
  • permutation invariant since the aggregation functions are also permutation invariant
  • this remains transferrable across graphs, unlike the fully connected readout layer

graph isomorphism problem

Graph Isomorphism

Let and be two graphs. A graph isomorphism between and is a bijection such that for all (see graph isomorphism)

A graph isomorphism exists exactly when two graphs are equivalent.

Question

Can we train a GNN to detect if graphs are isomorphic?

  • ie, to produce identical outputs for graphs in the same equivalence class/graphs which are isomorphic, and different outputs for graphs that are not isomorphic?

Consider two graphs and with laplacian and . Assume they do not have node features (but we can impute them)

  • Consider the following single-layer (linear) GNN Suppose there is some such that for all - ie, has at least one eigenvalue that is different from

We can verify whether graphs without node features and different laplacian eigenvalues are not isomorphic

Theorem

Consider two graphs and without node features, with laplacian and . Further suppose that there is some such that for all - ie, has at least one eigenvalue that is different from .

Then there exists a graph convolution we can use to verify whether two graphs are NOT isomorphic, ie to test that

Proof

Consider the following single-layer (linear) GNN Define the graph feature to be white, ie such that .

Set

Then we have

\mathbb{E}(\hat{y}) &= \mathbb{E}(V^Ty) \ &= \mathbb{E}(V^Th_{k}Lx) \ &= \mathbb{E}(V^TV\Lambda V^Tx) \ &= \mathbb{E}(\Lambda x) \ \implies \mathbb{E}(\hat{y}) &= \begin{bmatrix} \lambda_{1} \ \lambda_{2} \ \vdots \ \lambda_{n} \end{bmatrix} \end{aligned}$$ And if there is some such that for all - ie, has at least one eigenvalue that is different from as above, then it suffices to compare

Since the laplacian is positive semidefinite, the sums will be different if and only if we have at least eigenvalue that is different between them.

(see paper GNNs are more powerful than you think)

However, there are many non-isomorphic graphs that share the same eigenvalues

Example

These graphs are clearly non-isomorphic, but their laplacian eigenvalues are the same!

Thus, a strong alternative is the

So our eigenvalue-based heuristic doesn’t work for some graphs, but a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.

(continuing the example from above) has degrees and has degrees

The Weisfeiler-Leman graph isomorphism test consists of running the color refinement algorithm for two graphs.

Motivation eigenvalue-based heuristic doesn't work for some graphs, a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.

Since the

The color refinement algorithm is an algorithm to assign colorings to nodes.

Given graph with node features

let (assign colors based on node feature values) let

while True:

  1. let
  • assign colors based on previously assigned colors
  1. if , break. else,

Here, is a hash function that assigns colors.

Example

Color refinement for

No node features, so assume all 1. Assign colors based on node features

step :

c_{1}(1) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{3}(1) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{2}(1) &= f(k_{1}, { k_{1}, k_{1} }) &= k_{2} \end{aligned}$$

Evaluate: did the colors change?

  • Yes: continue!

step :

c_{1}(2) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{3}(2) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{2}(2) &= f(k_{2}, { k_{1}, k_{1} }) &= k_{2} \end{aligned}$$

Evaluate: did the colors change?

  • No - so we are done!
  • This is the final coloring.