Data
Clean: INPUT[toggle(offValue(0), onValue(1)):clean] Linked: INPUT[toggle(offValue(0), onValue(1)):linked] Publish: INPUT[toggle(offValue(0), onValue(1)):dg-publish]
Maturity: INPUT[inlineSelect(option(seed), option(sprout), option(sapling), option(tree), option(snag), defaultValue(sprout)):maturity]
2025-02-03
Summary
- arbitrary linear graph filters
- graph convolution
- spectral representation of a convolutional graph filter
- conditions for finding a convolutional graph filter
Today
- Filter design
- Spectral filters
- filter learning
- statistical learning and ERM
- Types of learning problems on graphs
1. Graph Signals and Graph Signal Processing
Example
Suppose we want to design a lowpass filter for some graph signal.
(What does this mean? Consider radio stations: these have carrier waves, which are at higher frequencies than the information to be broadcast. We would need a way to get rid of the carrier frequency to get the desired signal)
Suppose we want to implement this using graph convolutions. Recall that we can represent any analytic function with convolutional graph filters.
Question
Is this function analytic?
heaviside functions such as the lowpass filter, a good approximation is the logistic function Where is a steepness constant
No, but we can often find good analytic approximations of non-analytic functions. For
Note
We can approximate heaviside functions using the logistic function as a proxy for our target function
Logistic functions are
As an illustration, we want to find coefficients such that , Here,
\tilde{f}(0) &= \frac{1}{1+e^{\alpha c}} \\ \tilde{f}'(\lambda) = \frac{1}{(1+e^{-\alpha(\lambda-c)})^2}e^{-\alpha(\lambda-c)}+\alpha \implies \tilde{f}'(0) &= \frac{\alpha e^{\alpha c}}{(1+e^{-\alpha c})^2} \\ \tilde{f}''(\lambda)=\frac{-\alpha^2e^{-\alpha(\lambda-c)}}{(1+e^{-\alpha(\lambda-c)})^2} + \frac{2\alpha e^{-2\alpha(\lambda-c)} +\alpha}{(1+e^{-\alpha(\lambda-c)})^3} \\ \implies \tilde{f}''(0) &= \frac{\alpha^2e^{\alpha c}}{(1+e^{-\alpha c})^2}\left(\frac{2e^{\alpha c}}{(1+e^{-\alpha c})} -1\right) \\ \vdots \end{aligned}$$ Then $h_{0}=\tilde{f} (0), h_{1}=\tilde{f}'(0), h_{2}=\frac{\tilde{f}''(0)}{2}$ etc.(see approximation of heaviside functions using convolutional graph filters)
So we can do this. That’s great! But it can be costly to compute this approximation, especially the higher order derivatives of this function. Is there a better way to design such a filter?
Spectral Graph Filters
Let , for some diagonalizable shift operator and some graph signal .
Where is the graph fourier transform of signal . We design or learn the constants .
(see spectral graph filter)
Example
Let with this spectrum:
Suppose we want a lowpass filter with bandwith . Then we can simply truncate the spectrum to only allow the frequencies falling below our cutoff (in practice we usually select , or the index instead of the actual eigenvalue)
In model applications, we have moved away from system engineering to learning systems from data.
Supervised Statistical Learning
Suppose and are related by a statistical model . We are interested in predicting from using either or . In practice, we can only estimate these quantities using a model , where comes from a function or hypothesis class
How do we pick the best estimator ? This is the statistical risk minimization problem
- Define a loss function measuring the “cost” of predicting when the output is
Statistical Risk Minimization Problem
Suppose are related by some (known) statistical model , and we want to estimate using a model , where is a member of our hypothesis class .
Let be our loss function. Then the statistical risk minimization problem is to minimize the expected loss over distribution : The optimal estimator is the function with minimal expected cost over all possible functions .
Note
Typically, we are interested in either
- Predicting from with the convolutional distribution .
- ex: stochastic outputs: VAEs, diffusion, etc
- Predicting from with the conditional expectation
- ex: deterministic outputs, classical regime/supervised learning
Question
What is the issue with this?
Answer
In practice we only have access to the data. So instead, we can only estimate the risk according to the data
Consider again the statistical risk minimization problem where we wish to estimate from some known distribution . In practice, we typically do not have access to the distribution , we only have access to samples
Suppose we have samples , where and is unknown. Let be our loss function and our hypothesis class. Then the solution to the empirical risk minimization problem is ie we minimize the empirical mean
Example
Typical loss functions for the ERM are
- quadratic / loss for regression/estimation problems
- loss for classification problems
- since we often require differentiation, this is often replaced with a differentiable surrogate like cross entropy loss or logistic loss
Note
The ERM problem might have a closed-form solution (line in linear regression), but in modern ML, it is solved using optimization algorithms such as SGD or ADAM.
Types of Graph Signal Processing Problems
Question
Where do our filters fit into ERM?
Our hypothesis class are
We usually see 3 types of problems
Graph Signal Processing Problem
Graph Signal Processing Problem
In a graph signal processing problem (or graph regression or signal regression), the graph is the data support, and is thus fixed. The data are graph signals and we want to predict signals . Assuming , we regress signals on predictor signals .
Here, the hypothesis class are the graph convolutions \cal{F}$$=\left\{ z=\sum_{k=0}^{K-1} h_{k}S^kx, h_{k} \in \mathbb{R} \right\}
Our minimization problem is then
Example
The fixed graph is the US weather station network. Suppose we have our , the recorded temperatures from February 3 from the last years, and our , the recorded temperatures from November 3 from the last years.
- Temperatures on the graph are recorded as time series and
- we want to predict the february temperatures from the november temperatures.
Application: temperature forecasting. Predict february 2026 temperatures from the november 2025 temperatures as
Using loss, our minimization problem becomes:
Graph-Level Problems
Graph-Level Problems
In graph-level problems, there are multiple graphs. Each graph represents a predictor associated with an observation . We assume that and want to regress on .
Here, our hypothesis class is \cal{F}=$$\left\{ f(S) = \sum_{k=1}^{K-1} h_{k}S^k \mathbb{1} | h_{k} \in \mathbb{R} \right\}
as our constant signal!
Since there are no graph signal observations, we use the vector of all ones
And our minimization problem is given by (see graph-level problem)
Example
Suppose we want to predict the number of triangles incident to each node for any graph
In this example, since our output is an integer, it mades sense to use the loss or a surrogate: Application: automate triangle counting
Observation
Both the graph signal processing problem and the graph-level problem are supervised learning (sometimes called transductive learning) problems. ie none of the test inputs are seen at training time.
(see supervised learning)
Node-Level Tasks
Node-Level tasks
Node-level tasks (sometimes called inductive learning or semi-supervised learning) have graph as the data support (ie, it is fixed). Here, we examine each node as a sample, ie we assume that the signal and observation at node is . ie, each node is treated as a sample.
We assume we only observe for a subset of the nodes and want to estimate
(see node-level task)
Example
Consider the contextual SBM: undirected and (say represents the community of node ). The edges are random: with node features/covariates where
- If then and
- if then
Goal: predict from hypothesis class: the graph convolutions \cal{F}=$$\left\{ z=\sum_{k=0}^{K+1}h_{k}S^kx, h_{k}\in\mathbb{R} \right\}
Problem: Define a mask . Then and . Then we minimize: Application: infer node’s class/community/identity locally ie without needing communication, determining clustering techniques, which require eigenvectors (global graph information)
Observation
This problem is an example of inductive learning or semi-supervised learning because the test data predictors are seen at training time.
(see lots of techniques from Probabilistic Machine Learning for some of the ways we can do this)
Example
Housekeeping: bring computer next week for pytorch tutorial


The fixed graph is the US weather station network. Suppose we have our
Suppose we want to predict the number of triangles incident to each node for any graph