2025-02-10 graphs lecture 6

Data

2025-02-10

 

1. Graph Signals and Graph Signal Processing

Community graph example

Summary

Today

  • community detection (HW 1 - wednesday)
  • recap SBM (last time)
  • spectral clustering algorithm
  • spectral embeddings
  • contextual SBM

stochastic block model

an -node stochastic block model (SBM) graph with communities is given by where

  • is the community assignment matrix (if then node belongs to community )

  • is the matrix of intra- and inter-community probabilities: is the edge probability for a node pair such that

  • is the adjacency matrix

can take any value (as long as it is symmetric), often we have with

While

(see stochastic block model)

We say an SBM is balanced when all communities have the same size (); it is unbalanced otherwise.

(see balanced stochastic block model)

Example

Consider the 2-community balanced SBM. For an -node graph, suppose the nodes are labeled such that the first are in community 1 and the remaining nodes are in community 2. Then

1 & 0 \ 1 & 0 \ 0 & 1 \ \vdots & \vdots \ 0 & 1 \ \end{bmatrix}, P \in \mathbb{R}^{(n/2+n/2) \times(n/2+n/2)}=\left[ \begin{array}{ccc|ccc}
p & \dots & p & q & \dots & q \ & & \vdots & \vdots & & \vdots \ p & \dots & p & q & \dots & q \ \hline q & \dots & q & p & \dots & p \ \vdots & & \vdots & \vdots & & \vdots \ q & \dots & q & p & \dots & p\end{array}\right]$$ Recall that , ie

Thus, . Let’s compute the eigenvectors and eigenvalues of .

  • First, we observe that this matrix is rank 2 and will thus have 2 eigenvectors

  • It is easy to see that and . Thus

  • Further, we note that , where

  • Thus with eigenvalue

to reveal the community assignments of the nodes.

This means that we can use the vector associated with the second largest eigenvalue (in absolute value) of

This is the intuition behind spectral clustering. For an arbitrary graph with adjacency matrix , we assume that , where is random noise with .

Thus, we can use our sample mean adjacency matrix to estimate the community assignments of our nodes!

Spectral Clustering

Tip

What happens when (more general case than example above)?

  • The community assignment is a linear combination of the top eigenvectors (in absolute value)

Suppose we are given and . How do we estimate ?

spectral clustering algorithm

  1. Diagonalize
  2. Order the eigenvectors by decreasing eigenvalue magnitude This yields we can see the rows as embeddings of the notes in (community space)
  3. yay we can cluster (-means or whatever you want. can also use like gaussian mixture models)

(see spectral clustering)

This is unsupervised learning - it only assumes knowledge of . When we know some subset of notes , these can be used to make better predictions and turns the task into a semi-supervised one. This is what we do more often in practice.

Explicitly, we can convert the problem to:

where is some parametric function, is the one-hot community vector, and if is in community . In practice, the indicator/ loss is substituted for cross-entropy loss

Spectral Embedding

Suppose we are given a diagonalizable adjacency matrix for a graph with nodes partitioned into classes. Suppose we know the assignments for some of the nodes . Let be the top eigenvectors of .

Finding the spectral embedding amounts to solving the problem where comes from our hypothesis class and is a predetermined pointwise nonlinearity.

Notes -dimensional embeddings of the nodes of the graph into feature space.

This can be thought of as an FC-NN on the

  • in practice, a surrogate is usually used for the 0-1 loss
  • to find the optimal ie the optimal , we solve using gradient descent methods

(see spectral embedding)

Spectral clustering is fully unsupervised, spectral embeddings incorporates some prior knowledge making this semi-supervised

information theoretic threshold

Question

What happens when in the balanced SBM with ?

, we have an erdos-renyi graph, in which the edge prob. is constant for all nodes. There are no communities to distinguish.

It becomes impossible to cluster! When

But even when , there is a region around where detection of communities is impossible in an information theory sense

We want to recover as close as possible the exact partition of nodes into communities. Without perfect information, this amounts to finding almost exact recovery:

Almost exact recovery

In a stochastic block model problem, almost exact recovery of the assignments of nodes to communities is achieved when ie , ie the function converges almost surely to the true labels.

(see almost exact recovery)

How do we know if almost exact recovery is possible? From an information theoretic point of view, we can determine this using the signal to noise ratio.

signal-to-noise ration (snr)

The signal to noise ratio is the ratio of the power of signal (the thing we want to measure) to the power of the extraneous noise. This is given as: Where is the average power, or second moment. When the signal and noise are random variable, then we have (see signal to noise ratio)

Example

In our example with , with , then

  • numerator is the signal: degree discrepancy in nonequal subgraphs
  • denominator is noise: the average degree for all nodes in the graph

Theorem

Suppose we have a stochastic block model problem where the adjacency matrix , and .

If the signal to noise ratio , then almost exact recovery is impossible. Even with infinite time and resources, there is no algorithm that can recover the true communities with just .

(see almost exact recovery is impossible when the signal to noise ratio is less than the threshold)

Proof

see proofs in Massoulié (2014) and Mossel (2014). Aside: interesting to see since there are different proof methods from different domains. See also Abbe (survey papers).

Example: Sparse Graphs

and as (the average degree vanishes).

Then

&= \frac{1}{2n} \frac{(a-b)^2}{(a+b)} \\ SNR_{p,q}< \frac{1}{n} &\iff \frac{1}{2n} \frac{(a-b)^2}{a+b} < \frac{1}{n} \\ \implies \frac{(a-b)^2}{2(a+b)} &< 1

\end{aligned}$$ ie, in sparse graphs, it is not difficult to identify the information theoretic threshhold (we can easily calculate the signal to noise ratio)

Next time, we will look at another SBM with more information (in the form of node features), then we can get better thresholds

Note

From 2025-02-05 graphs lecture 5: there was a “typo” in the distributions for our SBM example.