graph attention model

[[concept]]

Graph Attention (GAT)

$(x_{ℓ})_{i} = σ (\sum_{j \in N (i)} α_{i j} (x_{ℓ - 1})_{j} H_{ℓ})$

Where the $α_{i j}$ are the learned graph attention coefficients, computed as

α_{i j} = {softmax}_{N (i)} (e_{i j}), e_{i j} = \frac{ρ (\vec{a} (x_{i} H^{'} | | x_{j} H^{'}))}{\sum_{j^{'} \in N (i)} ρ (\vec{a} (x_{i} H^{'} | | x_{j^{'}} H^{'}))}

Here, we are learning the graph weight matrix $E$ via learning $H^{'}$ . We can think of this as calculating the "relative similarity" between transformed features of $i$ and $j$ vs the similarity of all neighbors of $i$ .

$(\cdot | | \cdot)$ is the row-wise concatenation operation.
- so $\vec{a} \in R^{1 \times 2 d}$
- and each $x_{i} H^{'} \in R^{d}$
$ρ$ is a pointwise nonlinearity, typically leaky ReLU

The learnable parameters for this model are $\vec{a}, H_{ℓ}, H^{'}$ for $1 \leq ℓ \leq L$ .

Notes

This architecture is local because it is still respecting the graph sparsity pattern, and learning the "similarities" of the transformed features at each layer of the network.
This architecture does not depend on the size of the graph, only the dimension of the features
It can still be expensive to compute however, if the graph is dense. We need to compute $| E |$ coefficients $α_{i j}$ , which can be up to $n^{2}$ in complete graphs.

This additional flexibility increases the capacity of the architecture (less likely to underfit to training data), and this has ben observed empirically. But this comes at the cost of a more expensive forward pass.

This architecture is available and implemented in PyTorch Geometric

Note

We can think of the GAT layer $(x_{ℓ})_{i} = σ (\sum_{j \in N (i)} α_{i j} (x_{ℓ - 1})_{j} H_{ℓ})$ as a convolutional layer where the GSO $S$ , is learned

however, this architecture is not convolutional since $S$ is changing at each step.
But it can be useful to see how we can write/think about it as something that looks convolutional (if you squint)

Mentions

File
2025-02-19 graphs lecture 9
2025-04-16 lecture 21
2025-02-25 equivariant lecture 4