2025-02-10 graphs lecture 6

Data

2025-02-10

1. Graph Signals and Graph Signal Processing

Community graph example

Summary

Today

  • community detection (HW 1 - wednesday)
  • recap SBM (last time)
  • spectral clustering algorithm
  • spectral embeddings
  • contextual SBM
stochastic block model

an n-node stochastic block model (SBM) graph with C communities is given by

P=YBYT,AP

where

  • Y{0,1}n×c is the community assignment matrix (if yic=1 then node i belongs to community c)
  • B is the matrix of intra- and inter-community probabilities: Bc1,c2 is the edge probability for a node pair (i,j) such that Yic1=1,Yjc2=1
  • A is the adjacency matrix
Note

While B can take any value (as long as it is symmetric), often we have $$B = \begin{cases}p \text{ if }c_{1}=c_{2} \ q \text{ otherwise}\end{cases}$$
with q=1p

(see stochastic block model)

We say an SBM is balanced when all communities have the same size (nc); it is unbalanced otherwise.
(see balanced stochastic block model)

Example

Consider the 2-community balanced SBM. For an n-node graph, suppose the nodes are labeled such that the first n2 are in community 1 and the remaining nodes are in community 2. Then

yRn×2=[10100101],PR(n/2+n/2)×(n/2+n/2)=[ppqqppqqqqppqqpp]

Recall that AP, ie Aij={1 with prob. Pij0 with prob 1Pij

Thus, E(A)=P . Let's compute the eigenvectors and eigenvalues of P=E(A).

Note

This means that we can use the vector associated with the second largest eigenvalue (in absolute value) of E(A)=P to reveal the community assignments of the nodes.

This is the intuition behind spectral clustering. For an arbitrary graph with adjacency matrix A, we assume that A=ASBM+ϵ, where ϵ is random noise with E(ϵ)=0.

Thus, we can use our sample mean adjacency matrix to estimate the community assignments of our nodes!

Spectral Clustering

Tip

What happens when C>2 (more general case than example above)?

  • The community assignment is a linear combination of the top C eigenvectors (in absolute value)

Suppose we are given A and C. How do we estimate y?

spectral clustering algorithm

  1. Diagonalize A=VΛVT
  2. Order the eigenvectors by decreasing eigenvalue magnitude
    This yields $$V_{c} = \begin{bmatrix}v_{1} & v_{2} & \dots & v_{c}\end{bmatrix}, v_{i} \in \mathbb{R}^n = \begin{bmatrix}u_{1} \
    u_{2} \
    \vdots \
    u_{n}
    \end{bmatrix}, u_{j} \in \mathbb{R}^c$$
    we can see the rows uj as embeddings of the notes in Rc (community space)
  3. yay we can cluster (k-means or whatever you want. can also use like gaussian mixture models)

(see spectral clustering)

This is unsupervised learning - it only assumes knowledge of A. When we know some subset of notes TV, these can be used to make better predictions and turns the task into a semi-supervised one. This is what we do more often in practice.

Explicitly, we can convert the problem to:

minfiT1(f(A)i=yi)

where f is some parametric function, yi is the one-hot community vector, and yic=1 if i is in community c. In practice, the indicator/01 loss is substituted for cross-entropy loss

Spectral Embedding

Suppose we are given a diagonalizable adjacency matrix A for a graph with nodes partitioned into C classes. Suppose we know the assignments for some of the nodes TVertices. Let Vc be the top C eigenvectors of A.

Finding the spectral embedding amounts to solving the problem$$\min_{f} \sum_{i \in T} \mathbb{1}(f(A){i} = y)$$ where f comes from our hypothesis class {f:f(A)=σ(VcW),WRC×C} and σ is a predetermined pointwise nonlinearity.

Notes

This can be thought of as an FC-NN on the C-dimensional embeddings u1,,un of the nodes of the graph into feature space.

  • in practice, a surrogate is usually used for the 0-1 loss
  • to find the optimal f ie the optimal W, we solve using gradient descent methods

(see spectral embedding)

Spectral clustering is fully unsupervised, spectral embeddings incorporates some prior knowledge making this semi-supervised

information theoretic threshold

Question

What happens when pq in the balanced SBM with c=2?

Answer

It becomes impossible to cluster! When p=q, we have an erdos-renyi graph, in which the edge prob. is constant for all nodes. There are no communities to distinguish.

But even when pq, there is a region around p=q where detection of communities is impossible in an information theory sense

We want to recover as close as possible the exact partition of nodes into communities. Without perfect information, this amounts to finding almost exact recovery:

Almost exact recovery

In a stochastic block model problem, almost exact recovery of the assignments of nodes to communities is achieved when

P{1n1((f(A)i=yi))=1o(1)}=1o(1)

ie P(limn1n1[f(A)i=yi]=1)=1, ie the function converges almost surely to the true labels.

(see almost exact recovery)

How do we know if almost exact recovery is possible? From an information theoretic point of view, we can determine this using the signal to noise ratio.

signal-to-noise ration (snr)

The signal to noise ratio is the ratio of the power of signal (the thing we want to measure) to the power of the extraneous noise. This is given as:

SNR=PsignalPnoise

Where P_ is the average power, or second moment. When the signal and noise are Random Variables, then we have

SNR=E[Xsignal2]E[Xnoise2]

(see signal to noise ratio)

Example

In our example with c=2, with B={p if c1=c2q otherwise , then

SNR=(pq)22(p+q)
  • numerator is the signal: degree discrepancy in nonequal subgraphs
  • denominator is noise: the average degree for all nodes in the graph
    ^f75880
Theorem

Suppose we have a stochastic block model problem where the adjacency matrix AP,P=YBYT, and B={p if c1=c2q otherwise.

If the signal to noise ratio SNR<1n, then almost exact recovery is impossible. Even with infinite time and resources, there is no algorithm that can recover the true communities with just A.

(see almost exact recovery is impossible when the signal to noise ratio is less than the threshold)

Proof

see proofs in Massoulié (2014) and Mossel (2014). Aside: interesting to see since there are different proof methods from different domains. See also Abbe (survey papers).

Example: Sparse Graphs

p=an,q=bn and |E|n20 as n (the average degree vanishes).
Then

SNR=(anbn)22(a+b)n=12n(ab)2(a+b)SNRp,q<1n12n(ab)2a+b<1n(ab)22(a+b)<1

ie, in sparse graphs, it is not difficult to identify the information theoretic threshhold (we can easily calculate the signal to noise ratio)

Next time, we will look at another SBM with more information (in the form of node features), then we can get better thresholds

Note

From 2025-02-05 graphs lecture 5: there was a "typo" in the distributions for our SBM example.