graph attention model
[[concept]]
Graph Attention (GAT)
Where the
Here, we are learning the graph weight matrix
is the row-wise concatenation operation. - so
- and each
- so
is a pointwise nonlinearity, typically leaky ReLU
The learnable parameters for this model are
Notes
- This architecture is local because it is still respecting the graph sparsity pattern, and learning the "similarities" of the transformed features at each layer of the network.
- This architecture does not depend on the size of the graph, only the dimension of the features
- It can still be expensive to compute however, if the graph is dense. We need to compute
coefficients , which can be up to in complete graphs.
This additional flexibility increases the capacity of the architecture (less likely to underfit to training data), and this has ben observed empirically. But this comes at the cost of a more expensive forward pass.
This architecture is available and implemented in PyTorch Geometric
Note
We can think of the GAT layer
- however, this architecture is not convolutional since
is changing at each step. - But it can be useful to see how we can write/think about it as something that looks convolutional (if you squint)