[[lecture-data]]2025-02-19
Summary
1. Graph Signals and Graph Signal Processing
Graph Attention (GAT)
(x_{\ell})_{i} = \sigma\left( \sum_{j \in N(i)} \alpha_{ij}(x_{\ell-1})_{j} H_{\ell} \right)$$ Where the \alpha_{ij}$ are the ==learned== graph attention coefficients, computed as
\alpha_{ij} = \text{softmax}_{N(i)}(e_{ij}), \;\;\; e_{ij} = \frac{\rho(\vec{a}(x_{i}H' \lvert \rvert x_{j} H'))}{\sum_{j' \in N(i)} \rho(\vec{a} (x_{i}H' \lvert \rvert x_{j'}H' ))}$$ Here, we are learning the [[Concept Wiki/graph]] [[Concept Wiki/unweighted graph\|weight matrix]] EH’iji$.
- is the row-wise concatenation operation.
- so
- and each
- is a pointwise nonlinearity, typically leaky ReLU
The learnable parameters for this model are for .
Notes
- This architecture is local because it is still respecting the graph sparsity pattern, and learning the “similarities” of the transformed features at each layer of the network.
- This architecture does not depend on the size of the graph, only the dimension of the features
- It can still be expensive to compute however, if the graph is dense. We need to compute coefficients , which can be up to in complete graphs.
This additional flexibility increases the capacity of the architecture (less likely to underfit to training data), and this has ben observed empirically. But this comes at the cost of a more expensive forward pass.
This architecture is available and implemented in
PyTorch Geometric
Note
We can think of the GAT layer as a convolutional layer where the GSO , is learned
- however, this architecture is not convolutional since is changing at each step.
- But it can be useful to see how we can write/think about it as something that looks convolutional (if you squint)
Leaky ReLU
The leaky ReLU is a pointwise nonlinearity that is given as:
x,\;\;x \geq 0 \\ \alpha x, \;x < 0 \end{cases}$$ (see [[Concept Wiki/leaky ReLU]])
the “readout” in graph-level tasks
Idea
In graph-level problems, the output does not need to be a graph signal - it could be a scalar, a class label, or some vector in . That is, are all possible.
Question
How do we map GNN layers to such outputs?
Answer
We can use “readout” layers
Readout Layer
A readout layer is an additional layer added to a GNN to achieve the desired output type/dimension for graph-level tasks or other learning tasks on graphs that require an output that is not a graph signal.
(see readout layer)
Example
In node-level tasks (ex source localization, community detection, citation networks, etc), both the input and the output are graph signals . Thus, the map is composed strictly of GNN layers.
Benefit: convolutional graph filters are local. This ensures a parameterization that is independent of graph size
GNN layer equation
$X_{\ell} = \sigma(U_{\ell}) = \sigma\left( \sum_{k=0}^{K-1} S^k X_{\ell-1} H_{\ell, k} \right)$$
When we learn our GNN layers, we fix and learn only which does not depend on the graph size.
Option 1
fully connected readout layer
In a fully connected readout layer we define Where
- in general vectorizes matrices into vectors
- can be the identity or some other pointwise nonlinearity (ReLU, softmax, etc)
Note
There are some downsides of a fully connected readout layer
The number of parameters depends on - adding learning parameters, which grows with the graph size. This is not amenable to groups of large graphs
No longer permutation invariant because of the operation
fully connected readout layers are no longer permutation invariant
Verify that
- No longer transferrable across graphs.
- unlike in GNNs, depends on . So if the number of nodes changes, we have to relearn
These make this a not-so-attractive option, so we usually use an aggregation readout layer
Aggregation Readout Layer
The aggregation readout layer is a readout layer given by
where and is node-level aggregation
Typical aggregation functions are the mean, sum, max, min, median, etc
NOTE
- this is now independent of (my aggregation functions don’t change based on the size of my graph)
- permutation invariant since the aggregation functions are also permutation invariant
- this remains transferrable across graphs, unlike the fully connected readout layer
graph isomorphism problem
Graph Isomorphism
Let and be two graphs. A graph isomorphism between and is a bijection such that for all (see graph isomorphism)
A graph isomorphism exists exactly when two graphs are equivalent.
Question
Can we train a GNN to detect if graphs are isomorphic?
- ie, to produce identical outputs for graphs in the same equivalence class/graphs which are isomorphic, and different outputs for graphs that are not isomorphic?
Consider two graphs and with laplacian and . Assume they do not have node features (but we can impute them)
- Consider the following single-layer (linear) GNN Suppose there is some such that for all - ie, has at least one eigenvalue that is different from
Theorem
Consider two graphs and without node features, with laplacian and . Further suppose that there is some such that for all - ie, has at least one eigenvalue that is different from .
Then there exists a graph convolution we can use to verify whether two graphs are NOT isomorphic, ie to test that
Proof
Consider the following single-layer (linear) GNN Define the graph feature to be white, ie such that .
Set
Then we have
\mathbb{E}(\hat{y}) &= \mathbb{E}(V^Ty) \ &= \mathbb{E}(V^Th_{k}Lx) \ &= \mathbb{E}(V^TV\Lambda V^Tx) \ &= \mathbb{E}(\Lambda x) \ \implies \mathbb{E}(\hat{y}) &= \begin{bmatrix} \lambda_{1} \ \lambda_{2} \ \vdots \ \lambda_{n} \end{bmatrix} \end{aligned}$$ And if there is some such that for all - ie, has at least one eigenvalue that is different from as above, then it suffices to compare
Since the laplacian is positive semidefinite, the sums will be different if and only if we have at least eigenvalue that is different between them.
(see paper GNNs are more powerful than you think)
However, there are many non-isomorphic graphs that share the same eigenvalues
Example
These graphs are clearly non-isomorphic, but their laplacian eigenvalues are the same!
Thus, a strong alternative is the
So our eigenvalue-based heuristic doesn’t work for some graphs, but a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.
(continuing the example from above) has degrees and has degrees
The Weisfeiler-Leman graph isomorphism test consists of running the color refinement algorithm for two graphs.
Motivation eigenvalue-based heuristic doesn't work for some graphs, a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.
Since the
The color refinement algorithm is an algorithm to assign colorings to nodes.
Givengraph with node features
let(assign colors based on node feature values)letwhile
True:
let
- assign colors based on previously assigned colors
if,break.else,Here, is a hash function that assigns colors.
Example
Color refinement for
No node features, so assume all 1. Assign colors based on node features
step:
c_{1}(1) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{3}(1) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{2}(1) &= f(k_{1}, { k_{1}, k_{1} }) &= k_{2} \end{aligned}$$
Evaluate: did the colors change?
- Yes: continue!
step:
c_{1}(2) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{3}(2) &= f(k_{1}, { k_{1} }) &= k_{1} \ c_{2}(2) &= f(k_{2}, { k_{1}, k_{1} }) &= k_{2} \end{aligned}$$
Evaluate: did the colors change?
- No - so we are done!
- This is the final coloring.
Typical aggregation functions are the mean, sum, max, min, median, etc
These graphs are clearly non-isomorphic, but their 