[[lecture-data]]- finish cleaning up these notes once PDF is published clean ➕ 2025-03-10 📅 2025-03-12 ✅ 2025-03-12
Summary
1. Graph Signals and Graph Signal Processing
Recall the definition for an integral Lipschitz filter
integral lipschitz filter
Let be the spectral representation of a convolutional graph filter. is integral lipschitz on interval means there exists some such that \lvert h(\lambda) - h(\lambda') \rvert \leq \frac{C \lvert \lambda'-\lambda \rvert }{\frac{1}{2}\lvert \lambda+\lambda' \rvert }\;\;\;\forall \;\;\lambda, \lambda'$$ ie, h(\lambda)$ is lipschitz with a constant inversely proportional to the interval’s midpoint
Letting , we get \lambda h'(\lambda) \leq c \implies h'(x) \leq \frac{c}{\lambda} \to 0,\;\;\;\lambda \to \infty$$ This means that the filter can't change for large \lambda$.
Example
Notice that the filter becomes flat at high frequencies
- At low , they can be arbitrarily thin.
- At medium , these filters are approximately lipschitz filters.
- At high , they become flat and lose discriminability
NOTE
controls how discerning an integral Lipschitz filter is for frequencies at different ranges.
Stability to different graph perturbations
Recall the types of perturbations on shift operators:
Perturbation Types
We consider 3 perturbation types (first for graph convolutions, then for GNNs)
operator dilation
Let be a shift operator. We call an (operator) dilation (or scaling) if S' = (1+\epsilon)S$$ \implies \lvert \lvert S-S’ \rvert \rvert = \epsilon \lvert \lvert S \rvert \rvert $$ We can interpret this as the edges being scaled by .
[!def] Additive Perturbations
Let be a matrix. is an additive perturbation of if \tilde{S} = S+E, \;\;\;E= \tilde{S} - S,\;\;\; 0 \leq \lvert \lvert E \rvert \rvert \leq \epsilon$$ However, this definition is problematic since it is not [[Concept Wiki/filter permutation invariance\|invariant to permutations]]: \tilde{S} = P^TSP \implies E = P^TSP - S$$
For a graph shift operator, we define an additive perturbation modulo permutations: E(S,\tilde{S}) = \{ D: P^T\tilde{S}P = S+D \}$$ Where P \in \cal{P}$ is a permutation.
Given , the error is the with the smallest norm: \tilde{E} = \arg\min_{D \in E(S, \tilde{S})} \lvert \lvert D \rvert \rvert_{p} $$ Where \lvert \lvert \cdot \rvert \rvert_{p} = d(\cdot)\lvert \lvert D \rvert \rvert_{p} = d(S, \tilde{S})$
Unlike dilation, absolute perturbations (like the additive perturbations) do not take edge weights into account but can model changes in edge existence. This is fine for unweighted graphs, but for weighted graphs, this renders them unmeaningful. We want something that both
- takes edge weights into account
- can model edge additions and deletions ie a perturbation that can combine these two benefits.
Relative Perturbations modulo permutations
given , the error is with the smallest norm ie ie
where is the operator distance modulo permutations.
NOTE
- dilation takes into account edge weights, but cannot model edge additions and deletions
- additive perturbations can model edge additions and deletions, but do not take into account edge weights
- relative perturbations take into account both edge weights and additions/deletions
stability for dilations
Theorem
Let be a dilation, and consider graph convolution . If is a integral Lipschitz filter with constant , then
ie, integral Lipschitz filters are stable to dilations/scalings
Proof
H(S') - H(S) &= \sum_{k=0}^\infty S'^{k} - \sum_{k=0}^\infty S^k \\ S' = (1+\epsilon)S \implies H(S') - H(S) &=\sum_{k=0}^\infty h_{k} ((1+\epsilon^k)S^k - S^k) \end{aligned}Via binomial expansion, we have
(1+\epsilon)^k &= \sum_{i=0}^\infty {{k}\choose{i}} \epsilon^k \\ &= 1+k\epsilon + \cal{o}_{k}(\epsilon) \\ \end{aligned}Recall that means . Since the filter is analytic, is of order and is thus negligible. We can then write
\implies H(S) - H(S') &= \sum_{k=0}^\infty h_{k}((1+k\epsilon)S^k - S^k) + \cal{O}(\epsilon^2) \\ &= \sum_{k=0}^\infty h_{k} k \epsilon S^k + \cal{O}(\epsilon^2) \end{aligned}$$ Right multiplying each side by $x=\sum_{i=1}^\infty \hat{x}_{i} v_{i}$ (using the [[Concept Wiki/inverse graph fourier transform]]) we have $$\begin{aligned} \implies [H(S) - H(S')]x &= \sum_{k=0}^\infty h_{k} k \epsilon S^k \left( \sum_{i=1}^n \hat{x}_{i} v_{i} \right) \\ (*) &= \epsilon \sum_{k=0}^\infty h_{k} k \sum_{i=1}^n \lambda_{i}^k v_{i} \hat{x}_{i} \\ &= \epsilon \sum_{i=0}^n \sum_{k=0}^\infty (h_{k} k \lambda_{i}^{k}) \hat{x}_{i} v_{i} \\ &= \epsilon \sum_{i=0}^n \sum_{k=0}^\infty (h_{k} k \lambda_{i}^{k-1}) \lambda_{i} \hat{x}_{i} v_{i}\\ (**) &= \epsilon \sum_{i=0}^n [\lambda_{i} h'(\lambda_{i})] \hat{x}_{i} v_{i} \\ \lvert \lambda_{i h'(\lambda_{i})} \rvert \leq c \implies &\leq \epsilon c\sum_{i=1}^n \hat{x}_{i} v_{i} \end{aligned}$$ Where - the second line $(*)$ equality holds since the $v_{i}$ are [[Concept Wiki/eigenvalue\|eigenvectors]] of $S$ and - $(**)$ holds from the definition of [[Concept Wiki/integral Lipschitz filter]] when letting $\lambda' \to \lambda$. Thus $$\lvert \lvert (H(S) - H(S'))x \rvert \rvert \leq \epsilon c \lvert \lvert x \rvert \rvert \implies \lvert \lvert H(S) - H(S') \rvert \rvert \leq \epsilon c$$ ie, the [[Concept Wiki/integral Lipschitz filter]] is [[Concept Wiki/stable graph filter\|stable]] to [[Concept Wiki/operator dilation\|dilation]] $\blacksquare$
NOTE
This is universal for graphs of any size, ie any number of nodes.
This property of graph convolution is independent of the underlying graph.
Takeaway
This means that if we can control the Lipschitz constant , then we can design stable filters (with low ) - or learn stable filters by penalizing large .
The filter is still non-discriminative at high frequencies. This is the tradeoff for having stability in graph convolution.
see integral lipschitz filters are stable to dilations
stability to additive perturbations
Eigenvector misalignment
Let and . Then
If both and are normal (ie ), then .
Theorem
Let be an additive perturbation of graph shift operator , ie, with .
Suppose is a lipschitz graph filter with constant . Then Where is the eigenvector misalignment between and . Since both and are normal, we can see .
ie, graph convolutions are stable to additive perturbations, provided they have Lipschitz spectral response.
Proof
Exercise
Left as an exercise
Check Gama et al. 2019.
NOTE
The statement becomes This means we have lipschitz stability to additive perturbations when .
- this is not bad for small , but terrible for large graphs unless for .
- this holds for all graphs of size . ie, this is a property of the graph convolution and will be true regardless of the actual underlying graph
- Similar to filters that are stable to dilation, can be controlled by design or by penalizing large values.
Example
- there is a tradeoff between stability to the additive perturbations and discriminability
- The higher the , the higher the spectral discriminability. The lower the , the better the stability of the filter.
Example
see Lipschitz filters are stable to additive perturbations
stability to relative perturbations
Let be a relative perturbation on . Locally, we have
\tilde{S}_{ij} &= S_{ij} + (DS)_{ij} + (SD)_{ij} \\ &= S_{ij} + \sum_{k \in N(j)} D_{ik}S_{kj} + \sum_{k \in N(i)} S_{ik}D_{ki} \\ &= S_{ij} + (\propto \text{deg}(j)) + (\propto \text{deg}(i)) \end{aligned}$$ This tells us that the edge changes in the [[Concept Wiki/perturbations on a graph shift operator\|perturbed]] graph is tied to the degrees of the nodes. see [[Concept Wiki/relative perturbation edge changes are tied to node degree]] > [!theorem] > > Suppose $P^TSP = S + \tilde{E}S + S\tilde{E}$ with $\lvert \lvert \tilde{E} \rvert \rvert = \epsilon$. Suppose $h$ is an [[Concept Wiki/integral Lipschitz filter]] with constant $c$. Then > $$\lvert \lvert H(\tilde{S}) = H(S) \rvert \rvert _{p} \leq 2c (1+\delta \sqrt{ n })\epsilon + \cal{O}(\epsilon^2)$$ > Where $\delta$ is the [[Concept Wiki/eigenvector misalignment]] between $S$ and $\tilde{E}$. > > ie, [[Concept Wiki/integral Lipschitz filter\|integral Lipschitz filters]] are [[Concept Wiki/stable graph filter\|stable]] to [[Concept Wiki/relative perturbations]] > [!NOTE] > > This is the *same bound* as the one we got for [[Concept Wiki/lipschitz graph filter\|lipschitz graph filters]] on [[Concept Wiki/Lipschitz filters are stable to additive perturbations#^statement\|additive perturbations]], just with an additional factor of 2. And just like that bound, we see that > - again, this is not bad for small $n$, but *terrible for large graphs* unless $\delta = \cal{O}(n^{r})$ for $r \leq 0$. > - this holds for all graphs of size $n$. ie, this is a property of the [[Concept Wiki/graph convolution\|graph convolution]] and will be true regardless of the actual underlying graph > - Similar to filters that are stable to [[Concept Wiki/operator dilation\|dilation]] and [[Concept Wiki/additive perturbations\|additive perturbation]], the [[Concept Wiki/Lipschitz continuous\|Lipschitz]] constant $c$ can be controlled by design or by penalizing large values. > > The *difference* is that there is *no* [[Concept Wiki/stability-discriminability tradeoff for Lipschitz filters\|tradeoff]] between [[Concept Wiki/stable graph filter\|stability]] and [[Concept Wiki/discriminability of a graph filter\|discriminability]]. > > [!proof] > > check Gama et al. 2019 > [!NOTE] > > For high frequencies $\lambda$, there is *no* tradeoff between [[Concept Wiki/stable graph filter\|stability]] and [[Concept Wiki/discriminability of a graph filter\|discriminability]]. This is because the *filter is always flat* for high values of $\lambda$. Thus, the filter is always non-discriminative for these values. > > ![[Attachments/20250312-graph-1.png]] > > This is a direct result of the behavior of [[Concept Wiki/integral Lipschitz filter\|integral Lipschitz filters]] at high frequencies. > see [[Concept Wiki/integral Lipschitz filters are stable to relative perturbations]] >[!tip] Takeaway > >[[Concept Wiki/additive perturbations]] are not very realistic, and so a [[Concept Wiki/lipschitz graph filter]] is enough to get [[Concept Wiki/stable graph filter\|stability]]. > >If we have a [[Concept Wiki/perturbations on a graph shift operator\|perturbation]] that respects the graph sparsity pattern (models edge weights), we need an [[Concept Wiki/integral Lipschitz filter]] to get [[Concept Wiki/stable graph filter\|stability]]. >- However, there is a *big downside* for large graphs, since the bound depends on $n$. see [[Concept Wiki/stability and size tradeoff for realistic sparsity pattern considerations setting]] ## Generalization to [[Concept Wiki/Graph Neural Networks\|GNNs]] Generally, the stability properties of the convolutions will be inherited by their constituent [[Concept Wiki/graph convolution\|graph filters]]. - [x] finish up this section 📅 2025-03-12 ✅ 2025-03-12 - [x] add gnn stability theorems to concept wiki 📅 2025-03-12 ✅ 2025-03-12 > [!theorem] > > Let $\Phi(S,h)$ be an $L$-layer [[Concept Wiki/Graph Neural Networks\|GNN]]. Let $\tilde{S}$ be a [[Concept Wiki/perturbations on a graph shift operator\|graph perturbation]] modulo permutations. > > (1) if $\tilde{S} = S + \epsilon S$ and all filters are [[Concept Wiki/integral Lipschitz filter\|integral Lipschitz]], then > $$\lvert \lvert \Phi(S,h) - \Phi(\tilde{S}, h)\rvert \rvert_{p} \leq LC \epsilon + {\cal O}(\epsilon^2)$$ > > (stable to [[Concept Wiki/operator dilation\|dilation]]/scaling) > > (2) If $P^T\tilde{S}P = S\tilde{E}$ and all $h$ are [[Concept Wiki/lipschitz graph filter\|lipschitz]], then > $$\lvert \lvert \Phi(S,h) -\Phi(\tilde{S},h)\rvert \rvert _{p} \leq LC(1+\delta \sqrt{ n })\epsilon + {\cal O}(\epsilon^2)$$ > (stable to [[Concept Wiki/additive perturbations]]) > > (3) If $P^T\tilde{S}P = S + \tilde{E}S + S\tilde{E}$ and all $h$ are [[Concept Wiki/integral Lipschitz filter\|integral Lipschitz]], then > $$\lvert \lvert \Phi(S,h) -\Phi(\tilde{S},h)\rvert \rvert _{p} \leq 2LC(1+\delta \sqrt{ n })\epsilon + {\cal O}(\epsilon^2)$$ > (stable to [[Concept Wiki/relative perturbations]]) > > [!proof] > > We begin with some non-restrictive additional assumptions: > 1. $\lvert \lvert x_{\ell} \rvert \rvert \leq 1 \;\;\forall \ell$ - normalized input at all layers (easy to achieve with non-amplifying $h$, ie $\lvert \lvert H \rvert \rvert = 1$) > 2. $\sigma$ activation function/nonlinearity is normalized [[Concept Wiki/Lipschitz continuous\|Lipschitz]], ie has a Lipschitz constant of 1. > > Let $\lvert \lvert \tilde{E} \rvert \rvert = \epsilon$ for any of the three [[Concept Wiki/perturbations on a graph shift operator\|perturbation]] types. Let filters $h$ be stable to $\tilde{E}$ > with > $$\lvert \lvert H(\tilde{S}) = H(S) \rvert \rvert _{p} \leq c_{h} \epsilon$$ > > For each layer $1 \leq \ell \leq L$, we have $\ell$ is a [[Concept Wiki/graph perceptron]] with filter $H_{\ell}$. Then, note that > $$\begin{aligned} > \lvert \lvert \tilde{x}_{\ell} - x_{\ell} \rvert \rvert &= \lvert \lvert \sigma(H_{\ell }(\tilde{S})\tilde{x}_{\ell-1}) - \sigma(H_{\ell}(S) x_{\ell-1}) \rvert \rvert \\ > (\text{since }\sigma=1)\;\;\;\;\;&\leq \lvert \lvert H_{\ell}(\tilde{S}) \tilde{x}_{\ell-1} - H_{\ell}(\tilde{S}) x_{\ell-1}\rvert \rvert \\ > &= \lvert \lvert H_{\ell}(\tilde{S}) \tilde{x}_{\ell-1} - H_\ell(\tilde{S})x_{\ell-1} + H_{\ell}(\tilde{S})x_{\ell-1}- H_{\ell}(S) x_{\ell-1} \rvert \rvert \\ > &= \lvert \lvert H_{\ell}(\tilde{S}) [\tilde{x}_{\ell-1} - x_{\ell-1}] + [H_{\ell(\tilde{S})} - H_{\ell}(S)] x_{\ell-1}\rvert \rvert \\ > (\text{by } \triangle \text{ ineq.}) \;\;\;\;\; &\leq \cancelto{1}{\lvert \lvert H_{\ell}(\tilde{S}) \rvert \rvert} \cdot \lvert \lvert \tilde{x}_{\ell-1} - x_{\ell-1} \rvert \rvert + \cancelto{\leq 1}{\lvert \lvert x_{\ell-1} \rvert \rvert} \cdot \cancelto{\leq c_{h} \epsilon}{\lvert \lvert H_{\ell}(S) - H_{\ell} (\tilde{S})\rvert \rvert } \\ > (*) &\leq \lvert \lvert \tilde{x}_{\ell-1} - x_{\ell-1} \rvert \rvert + c_{h}\epsilon > \end{aligned}$$ > > We can apply the same reasoning to get a similar expression for $\lvert \lvert \tilde{x}_{\ell-1} - x_{\ell-1} \rvert \rvert, \lvert \lvert \tilde{x}_{\ell-2} - x_{\ell-2} \rvert \rvert, \dots$ etc for the final expression with $LC$ > > see [[Concept Wiki/GNNs inherit stability from their layers]] In the node domain, the nonlinearity does not affect the signal much. However, nonlinearities scatter the signal energy across the [[Concept Wiki/spectrum]]. > [!example] > > ![[Attachments/20250312-2025-03-10-graphs.png]] > - we first get a response $\hat{x}$ from the [[Concept Wiki/spectral representation of a convolutional graph filter]] > - after the nonlinearity, some of the energy modes to other areas of the spectrum > > Some of the energy in high frequencies "travels" to low frequencies. Both [[Concept Wiki/lipschitz graph filter\|lipschitz graph filters]] and [[Concept Wiki/integral Lipschitz filter\|integral Lipschitz filters]] can [[Concept Wiki/discriminability of a graph filter\|discriminate]] at these low frequencies. > > If any of our task depends on high-frequency data, then this scattering can help us perform well by moving some of the high-frequency information into the lower frequencies, where we have both [[Concept Wiki/stable graph filter\|stability]] and [[Concept Wiki/discriminability of a graph filter\|discriminability]] >[!tip] Takeaway > > [[Concept Wiki/Graph Neural Networks\|GNNs]] can perform better than their constituent [[Concept Wiki/graph convolution\|graph filters]]. That is, they are more [[Concept Wiki/stable graph filter\|stable]] for the same level of [[Concept Wiki/discriminability of a graph filter\|discriminability]], and more [[Concept Wiki/discriminability of a graph filter\|discriminative]] for the same level of [[Concept Wiki/stable graph filter\|stability]] than a direct composition of the same filters. > > This is because the nonlinear activation function "scatters" the signal across the spectrum, moving some high frequency information to lower frequencies and vice versa. Since both [[Concept Wiki/lipschitz graph filter\|lipschitz graph filters]] and [[Concept Wiki/integral Lipschitz filter\|integral Lipschitz filters]] can [[Concept Wiki/discriminability of a graph filter\|discriminate]] at low frequencies, the mixing from the nonlinearity helps the GNN account for a wider range of the information. see [[Concept Wiki/GNNs perform better than their constituent filters]]
Notice that the 