2025-02-12 graphs lecture 7

[[lecture-data]]

2025-02-12

Summary

2025-02-10 graphs lecture 6

This Time

  • information theoretic lower threshold
  • community detection

1. Graph Signals and Graph Signal Processing

information theoretic threshold

Recall from last time that in the balanced sbm with C=2, if pq, there is a region around p=q where the detection of communities is impossible from an information theory perspective using the signal to noise ratio.

Example

In our example with c=2, with B={p if c1=c2q otherwise , then
SNR=(pq)22(p+q)

  • numerator is the signal: degree discrepancy in nonequal subgraphs
  • denominator is noise: the average degree for all nodes in the graph
Theorem

Suppose we have a stochastic block model problem where the adjacency matrix AP,P=YBYT, and B={p if c1=c2q otherwise.

If the signal to noise ratio SNR<1n, then almost exact recovery is impossible.

Even with infinite time and resources, there is no algorithm that can recover the true communities with just the probabilities of adjacency. This gives us the information theoretic threshold

Proof

see proofs in Massoulié (2014) and Mossel (2014). Aside: interesting to see since there are different proof methods from different domains. See also Abbe (survey papers).

Example: Sparse Graphs

p=an,q=bn and |E|n20 as n (the average degree vanishes).
Then
SNR=(anbn)22(a+b)n=12n(ab)2(a+b)SNRp,q<1n12n(ab)2a+b<1n(ab)22(a+b)<1
ie, in sparse graphs, it is not difficult to identify the information theoretic threshhold (we can easily calculate the signal to noise ratio)

Takeaway

Community detection is difficult when the graph is sparse. Traditional methods may fail even before we hit this threshold!

contextual stochastic block model (C-SBM)

contextual stochastic block model (C-SBM)

The (binary) contextual stochastic block model (C-SBM) is like an SBM, but includes node features that are drawn from a Gaussian distribution. Here,

PR(n/2+n/2)×(n/2+n/2)=[ppqqppqqqqppqqpp]
  • y{1,1}n (or {0,1}) and BR2×2
    And node features X are drawn
  • xiμnyiu+zid, uNormal(0,Idd), ziNormal(0,Id)

So

Xi|Yi,uNormal(±μnu,Idd)

(see contextual stochastic block model)

Feature-aware spectral embeddings

Recall that is spectral clustering, which is fully unsupervised. For spectral embedding, we make use of some of the known community assignments.

feature-aware spectral embeddings

Feature-aware spectral embeddings incorporate the availability of node features into our predictions (for example, in a C-SBM) when using spectral embedding.

Let G be a graph with diagonalizable adjacency matrix A and node features XRn×d. Suppose we have C communities that we want to assign to the nodes.

  1. Diagonalize XXTRn×n as V~ΛV~T
  2. Pick the top κ eigenvectors to create V~(κ)Rn×κ
  3. Define Vc+κ=[Vc|V~(κ)] where Vc are the top C eigenvectors of A.

(see feature-aware spectral embeddings)

Before, we had

Spectral Embedding Problem

minfiT1(f(A)i=yi),f{f(A)=σ(VcW),WRC×C}

Now, our hypothesis class is instead:

Feature-Aware Spectral Embedding Hypothesis Class
f(A,x)=σ(Vc+κW),WR(c+κ)+(c+κ)
Note

In the presence of node features, the information theoretic threshold for community detection becomes (with p=an,q=bn)

SNR(A)+SNR(x)=(ab)22(a+b)+μ2dn>1
Takeaway

the community detection is possible in an information theoretic sense when pq as long as the means of the communities are sufficiently separated (high μ and/or high d).

Computational thresholds for spectral clustering (non-rigorous)

maybe there exists an algorithm that can find these communities, but can we do so in polynomial time? Can we do even better?

Example

2025-02-10-graph.png
spectral redemption - Krzkala et al 2013. Computational threshold using adjacency matrix

Takeaway

Even by doing spectral embedding or feature-aware spectral embeddings, since we have to fix c (which is often unknown) and κ (the number of eigenvectors we use) our spectral algorithms might fail.

In practice, spectral algorithms fail above the information theoretic threshold (sometimes, significantly above that threshold!)

(see sometimes spectral algorithms fail)

Question

Can GNNs help?

Answer

Let's go back to Lecture 3. We saw that the expressivity of graph convolutions can find regression solutions in many cases

Recall the conditions for finding a convolutional graph filter:

Theorem

Let S be a shift operator with eigenvalues λ1,λ2,,λn, and x a graph signal. Let x^ be the GFT.

Suppose

  • x^i0i
  • λiλjij

Then there exist Kn coefficients h0,,hK1 such that y=H(S)x, and H is a graph convolution

In the example above, having access to C>C eigenvectors might have helped. But not only do we need to fix C when unknown, we also have to pick a large enough C so that we don't miss informative eigenvalues "lost in the bulk" like in the paper.

This motivates using GNNs for semi-supervised community detection on graphs

Takeaway

We can use GNNs to solve the semi-supervised community detection node-level task on graphs.

Let G be a graph with diagonalizable adjacency matrix A and node features XRn×d. Suppose we have C communities that we want to assign to the nodes.

As long as X satisfies the conditions for finding a convolutional graph filter, namely that

  • x^i0i and
  • λiλj for all ij

then there exists a graph convolution (or a 1-layer linear GNN) that approximates y. The optimization problem then is the same as before:

Community Detection Optimization Problem

minfiJ(f(A,X)i,yi)

where () is a surrogate of the 0-1 loss, each yi is the one-hot community vector of node i, and f is some parametric function of A and X

And we can choose our hypothesis class so that f(A,X)=ΦH(A,X), a GNN with learnable parameters H (this is the form of the function we defined in lecture 5)

(see we can use GNNs to solve feature-aware semi-supervised learning problems)

Housekeeping

  • Homework due 2 weeks from today (first week of)

A note on sparse matrix-vector multiplications

Sx, Sk1x=Sk2Sx

Example

2025-02-10-graph-1.png
Illustration to see the n2 operations required

Typically the graph shift operator S (ie, adjacency matrixA, graph laplacian L, etc) are sparse n×n matrices, ie |E|n21

How sparse matmuls work

Suppose we have a sparse adjacency matrix

A=[12026000191426190001407]

There are two main options to store this more efficiently.

Coordinate (COO)rdinate Representation

A can become a set of |E| tuples of the form (row index,column index,value). So this matrix becomes the ordered set of these tuples, sorted by row and then column index.

This can be better from a memory perspective, but can still be quite large if we have a few nodes with high degree. A nice solution to this is to use compressed sparse row representation instead.
(see coordinate representation)

Example

A=[12026000191426190001407]

can become
(0,0,12),(0,2,26),(1,2,19),(1,3,14),(2,0,26),(2,1,19),(3,1,14),(3,3,7)

Another method to compress a sparse matrix's information is through compressed sparse row (CSR) representation - this is most popular.

Compressed Sparse Row (CSR) representation

Suppose ARn×m is a sparse matrix. We can represent it as a set of 3 tensors/"lists" called compressed sparse row representation (CSR). We get this representation by realizing that we can encode the row index in the COO representation a bit more efficiently.

Instead of writing out the row index for each element, we can collect the column indices for each row, put each of these collections together, and then have a pointer tell us where to start reading.

If A has z non-zero entries, then

  • the column tensor contains z entries. Each entry contains the column index for one of the non-zero elements, sorted by ascending row index.
  • row or pointer tensor contains n+1 entries
    • the first n entries contain the starting index for where the row's elements start in the column tensor
    • the last entry is z
  • value tensor contains z entries and contains the non-zero elements sorted by row then column index.

(see compressed sparse row representation)

Example

A=[12026000191426190001407]

Pointer = [0 2 4 6]

index0123read start0246

Column = [0 2 2 3 0 1 1 3]

index01234567col indx02230113

Value = [12 16 19 14 26 19 14 7]

index01234567value122619142619147

Matrix multiplication can then become much more efficient computationally. Using sparse matrix multiplication we get

total operations=i=1n|Ni|=i=1ndi=1TA1=|E|

in PyTorch, to transform to a sparse tensor:

matrix-vector multiplication is as before: S @ x or torch.mv(S,x) for both representations and this will give a faster multiplication.