2025-02-10 graphs lecture 6
2025-02-10
1. Graph Signals and Graph Signal Processing
Community graph example
Today
- community detection (HW 1 - wednesday)
- recap SBM (last time)
- spectral clustering algorithm
- spectral embeddings
- contextual SBM
an
where
is the community assignment matrix (if then node belongs to community ) is the matrix of intra- and inter-community probabilities: is the edge probability for a node pair such that is the adjacency matrix
While
with
(see stochastic block model)
We say an SBM is balanced when all communities have the same size (
(see balanced stochastic block model)
Recall that
Thus,
. Let's compute the eigenvectors and eigenvalues of .
- First, we observe that this matrix is rank 2 and will thus have 2 eigenvectors
- It is easy to see that
and . Thus - Further, we note that
, where - Thus
with eigenvalue NoteThis means that we can use the vector associated with the second largest eigenvalue (in absolute value) of
to reveal the community assignments of the nodes.
This is the intuition behind spectral clustering. For an arbitrary graph with adjacency matrix
Thus, we can use our sample mean adjacency matrix to estimate the community assignments of our nodes!
Spectral Clustering
What happens when
- The community assignment is a linear combination of the top
eigenvectors (in absolute value)
Suppose we are given
- Diagonalize
- Order the eigenvectors by decreasing eigenvalue magnitude
This yields $$V_{c} = \begin{bmatrix}v_{1} & v_{2} & \dots & v_{c}\end{bmatrix}, v_{i} \in \mathbb{R}^n = \begin{bmatrix}u_{1} \
u_{2} \
\vdots \
u_{n}
\end{bmatrix}, u_{j} \in \mathbb{R}^c$$
we can see the rowsas embeddings of the notes in (community space) - yay we can cluster (
-means or whatever you want. can also use like gaussian mixture models)
(see spectral clustering)
This is unsupervised learning - it only assumes knowledge of
Explicitly, we can convert the problem to:
where
Suppose we are given a diagonalizable adjacency matrix
Finding the spectral embedding amounts to solving the problem$$\min_{f} \sum_{i \in T} \mathbb{1}(f(A){i} = y)$$ where
This can be thought of as an FC-NN on the
- in practice, a surrogate is usually used for the 0-1 loss
- to find the optimal
ie the optimal , we solve using gradient descent methods
(see spectral embedding)
Spectral clustering is fully unsupervised, spectral embeddings incorporates some prior knowledge making this semi-supervised
information theoretic threshold
What happens when
It becomes impossible to cluster! When
But even when
We want to recover as close as possible the exact partition of nodes into communities. Without perfect information, this amounts to finding almost exact recovery:
In a stochastic block model problem, almost exact recovery of the assignments of nodes to communities is achieved when
ie
(see almost exact recovery)
How do we know if almost exact recovery is possible? From an information theoretic point of view, we can determine this using the signal to noise ratio.
The signal to noise ratio is the ratio of the power of signal (the thing we want to measure) to the power of the extraneous noise. This is given as:
Where
(see signal to noise ratio)
In our example with
- numerator is the signal: degree discrepancy in nonequal subgraphs
- denominator is noise: the average degree for all nodes in the graph
^f75880
Suppose we have a stochastic block model problem where the adjacency matrix
If the signal to noise ratio
(see almost exact recovery is impossible when the signal to noise ratio is less than the threshold)
see proofs in Massoulié (2014) and Mossel (2014). Aside: interesting to see since there are different proof methods from different domains. See also Abbe (survey papers).
Then
ie, in sparse graphs, it is not difficult to identify the information theoretic threshhold (we can easily calculate the signal to noise ratio)
Next time, we will look at another SBM with more information (in the form of node features), then we can get better thresholds
From 2025-02-05 graphs lecture 5: there was a "typo" in the distributions for our SBM example.