2025-02-03 graphs lecture 4

Data

Clean: INPUT[toggle(offValue(0), onValue(1)):clean] Linked: INPUT[toggle(offValue(0), onValue(1)):linked] Publish: INPUT[toggle(offValue(0), onValue(1)):dg-publish]
Maturity: INPUT[inlineSelect(option(seed), option(sprout), option(sapling), option(tree), option(snag), defaultValue(sprout)):maturity]

2025-02-03

Summary

1. Graph Signals and Graph Signal Processing

Example

Suppose we want to design a lowpass filter for some graph signal.

2025-02-03_graph-0.jpg

(What does this mean? Consider radio stations: these have carrier waves, which are at higher frequencies than the information to be broadcast. We would need a way to get rid of the carrier frequency to get the desired signal)

Suppose we want to implement this using graph convolutions. Recall that we can represent any analytic function with convolutional graph filters.

Question

Is this function analytic?

Answer

No, but we can often find good analytic approximations of non-analytic functions. For heaviside functions such as the lowpass filter, a good approximation is the logistic function

f~(λ)=11+eα(λc)

Where α is a steepness constant

Note

We can approximate heaviside functions using the logistic function as a proxy for our target function

Proof

Logistic functions are analytic, and recall that we can represent any analytic function with convolutional graph filters.

As an illustration, we want to find coefficients hk such that k=0K1hkλk=f~(λ),

f~(λ)=11+eα(λc)

Here,

f~(0)=11+eαcf~(λ)=1(1+eα(λc))2eα(λc)+αf~(0)=αeαc(1+eαc)2f~(λ)=α2eα(λc)(1+eα(λc))2+2αe2α(λc)+α(1+eα(λc))3f~(0)=α2eαc(1+eαc)2(2eαc(1+eαc)1)

Then h0=f~(0),h1=f~(0),h2=f~(0)2 etc.

(see approximation of heaviside functions using convolutional graph filters)

So we can do this. That's great! But it can be costly to compute this approximation, especially the higher order derivatives of this function. Is there a better way to design such a filter?

Spectral Graph Filters

Let S=VΛV, x^=Vx for some diagonalizable shift operator S and some graph signal x.

y=j=1ncjx^jvj

Where x^ is the graph fourier transform of signal x. We design or learn the constants cj.

(see spectral graph filter)

Example

Let S=L with this spectrum:
2025-02-03_graph-1.png

Suppose we want a lowpass filter with bandwith λc . Then we can simply truncate the spectrum to only allow the frequencies falling below our cutoff (in practice we usually select jc, or the index instead of the actual eigenvalue)

In model applications, we have moved away from system engineering to learning systems from data.

Supervised Statistical Learning

Suppose x and y are related by a statistical model p(x,y). We are interested in predicting y from x using either p(y|x) or E(y|x). In practice, we can only estimate these quantities using a model y~=f(x),fF, where f comes from a function or hypothesis class F

How do we pick the best estimator f ? This is the statistical risk minimization problem

Statistical Risk Minimization Problem

Suppose x,y are related by some (known) statistical model p(x,y), and we want to estimate p(x,y) using a model y~=f(x), where f is a member of our hypothesis class F.

Let (y,y^) be our loss function. Then the statistical risk minimization problem is to minimize the expected loss over distribution p(x,y):

f=argminfFEp(x,y){(y,f(x))}

The optimal estimator f is the function fF with minimal expected cost over all possible functions f.

Note

Typically, we are interested in either

  • Predicting y from x with the convolutional distribution yp(y|x).
  • ex: stochastic outputs: VAEs, diffusion, etc
  • Predicting y from x with the conditional expectation y=E(y|x)
  • ex: deterministic outputs, classical regime/supervised learning

(see statistical risk minimization problem)

Question

What is the issue with this?

Answer

In practice we only have access to the data. So instead, we can only estimate the risk according to the data

Consider again the statistical risk minimization problem where we wish to estimate y=f(x) from some known distribution p(x,y). In practice, we typically do not have access to the distribution p(x,y), we only have access to samples T={xj,yj}j=1M

Suppose we have samples T={xj,yj}j=1M , where x,yp(x,y) and p(x,y) is unknown. Let (y,y^) be our loss function and F our hypothesis class. Then the solution to the empirical risk minimization problem is

f=argminfF1Mj(yjf(xj))

ie we minimize the empirical mean

Example

Typical loss functions for the ERM are

  • quadratic / L2 loss (x,z)=||yz||22 for regression/estimation problems
  • 01 loss =I(yz) for classification problems
Note

The ERM problem might have a closed-form solution (line in linear regression), but in modern ML, it is solved using optimization algorithms such as SGD or ADAM.

Types of Graph Signal Processing Problems

Question

Where do our filters fit into ERM?

Answer

Our hypothesis class are graph convolutional filters

We usually see 3 types of problems

Graph Signal Processing Problem

Graph Signal Processing Problem

In a graph signal processing problem (or graph regression or signal regression), the graph G is the data support, and is thus fixed. The data are graph signals xRn and we want to predict signals yRn. Assuming x,yp(x,y), we regress signals y on predictor signals x.

Here, the hypothesis class are the graph convolutions F={z=k=0K1hkSkx,hkR}

Our minimization problem is then $$\min_{h_{k}} \frac{1}{M}\sum_{j} \ell\left( y^{(j)}, \sum_{k=0}^{K-1}h_{k} S^k x^k \right)$$

(see graph signal processing problem)

Example

2025-02-03_graph-2.png
The fixed graph is the US weather station network. Suppose we have our y, the recorded temperatures from February 3 from the last n years, and our x, the recorded temperatures from November 3 from the last n years.

  • Temperatures on the graph are recorded as time series and
  • we want to predict the february temperatures from the november temperatures.

Application: temperature forecasting. Predict february 2026 temperatures (y) from the november 2025 temperatures (x) as

yk=0K1hkSkx

Using L2 loss, our minimization problem becomes:

minhk1Mj||y(j)k=0K1hkSkxk||22

Graph-Level Problems

Graph-Level Problems

In graph-level problems, there are multiple graphs. Each graph G represents a predictor associated with an observation yY. We assume that G,yp(G,y) and want to regress y on G.

Here, our hypothesis class is F={f(S)=k=1K1hkSk1|hkR}

Note

Since there are no graph signal observations, we use the vector of all ones 1 as our constant signal!

And our minimization problem is given by $$\min_{h_{k}} \frac{1}{m} \sum_{i=1}^M \ell\left( \sum_{k=0}^{K-1} h_{k} S^k \mathbb{1}, y \right)$$
(see graph-level problem)

Example

2025-02-03_graph-3.png
Suppose we want to predict the number of triangles incident to each node for any graph

In this example, since our output is an integer, it mades sense to use the 01 loss or a surrogate:

minhk1mi=1M(k=0K1hkSk1,y)

Application: automate triangle counting

Observation

Both the graph signal processing problem and the graph-level problem are supervised learning (sometimes called transductive learning) problems. ie none of the test inputs are seen at training time.

(see supervised learning)

Node-Level Tasks

Node-Level tasks

Node-level tasks (sometimes called inductive learning or semi-supervised learning) have graph G as the data support (ie, it is fixed). Here, we examine each node as a sample, ie we assume that the signal and observation at node i is xi,yip(x,y). ie, each node is treated as a sample.

We assume we only observe yi for a subset of the nodes JV and want to estimate yj,jVJ

(see node-level task)

Example

Consider the contextual SBM: G=(V,E) undirected and y{1,1}n (say yi represents the community of node i). The edges are random: P(Aij)=P((i,j)E)={an,yi=yjbn,otherwise with node features/covariates
xi=unyiu+zi where uN(0,1),zN(0,In)

  • If u=1 then xiN(1nu,1) and
  • if u=0 then xiN(1nu,1)

2025-02-03_graph-4.png

Goal: predict yi,iVJ from xiV
hypothesis class: the graph convolutions F={z=k=0K+1hkSkx,hkR}

Problem: Define a mask Mf{0,1}|J|×n. Then Mf1n=1|J| and 1|J|TMf=1nT. Then we minimize:

minhk1f(Mfy,Mfk=0K1hkSk(Mf)x)

Application: infer node's class/community/identity locally ie without needing communication, determining clustering techniques, which require eigenvectors (global graph information)

Observation

This problem is an example of inductive learning or semi-supervised learning because the test data (jVJ) predictors are seen at training time.

(see lots of techniques from Probabilistic Machine Learning for some of the ways we can do this)

Example

Housekeeping: bring computer next week for pytorch tutorial