Manifold Learning Lecture 11

[[lecture-data]]

Info

let current = dv.current();
let previous = current.previous_lecture;
 
// get notes with this as the previous
let pages = dv.pages("")
	.where(p => p.previous_lecture)
	.where(p => String(p.previous_lecture)
		.includes(dv.current().file.link));
 
let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `;
 
if (pages.length > 0) {
	output += pages
		.map(p => p.file.link).join(", ");
} else {
	output += 'No Next Lecture'
}
 
output += ' :LiArrowBigRightDash:'
 
dv.el('div', output)
 

Class Notes

Summary

5. Continuous Multidimensional Scaling

Convergence Analyses - Motivation from RDPGs

We’d like to use continuity to argue that the expected risk converges to something useful.

  1. Show that estimated latent positions converge to the true latent positions
  2. Show that the dissimilarity functions converge to Riemannian distance
    • We have uniform convergence when , but we want to generalize beyond this!
    • We might even want stronger convergence!
  3. Show that the Euclidean representations of the dissimilarity functions converge to a meaningful Euclidean representation of the Riemannian distance
  • [?] We know how to embed points in . (how) do these embeddings converge?
  • [I] Continuous Multidimensional Scaling defines the best possible representation of Riemannian distance and demonstrates convergence!

Motivation for Asymptotic Behavior

MDS A collection of techniques for constructing Euclidean respresentations of proximity data (info about pairwise (dis)similarities of a collection of objects)

Example

Minimizing with Kruskal’s raw stress criterion approximates dissimilarities with Euclidean distances

Traditionally, this was an embedding technique for observing low-dimensional structure of a data set. Modern applications would like to use this structure for subsequent inference.

We want to understand the limiting behavior of MDS on an increasing number of objects. To do this, we need asymptotic theories for inference on random graphs.

Infill asymptotic: adding more points in a fixed set of time (vs taking time to infinity) “finer and finer grids” vs “bigger and bigger grids”

Application to Network Science

A latent position graph is a generative probability model for networks

  • Vertices correspond to points in latent space
  • Edges are generated by independent Bernoulli trials, with success probabilities dependent on the latent positions

Example

A random dot product graph sets the probability of an edge existing between two vertices as the dot product between the latent positions.

(RDPG is a special case of a latent position graph)

Rubin-Delanchy et al show that “well-behaved” latent position graphs are equivalent to an RDPG where latent positions lie on a low-dimensional “manifold”.

Therefore, we consider RDPGs with unknown compact Riemannian manifold support .


The following methodology:

  1. Estimate latent positions via adjacency spectral embedding
  2. Learn via isomap
    • Approximate Riemannian distance on via shortest path on a localization graph
    • Approximate shortest path distances by Euclidean distances in
  3. Construct a decision rule in (perform inference)

This is manifold learning for subsequent inference!

If is known, then restricted inference will give us a decision rule exploiting our knowledge of .

If is unknown, then we want to learn enough about it to get some benefits from restricted inference

We want to know:

  • [?] Does the performance of rules based on the estimated manifold converge to the performance of a rule based on the true manifold?
  • [I] We will need asymptotic theory for the approximation of the Euclidean distances (when we perform MDS)

Point-to-Set Maps

Point-to-set map

Let denote complete metric spaces. A point-to-set map assigns a subset of to each point

see point-to-set map

Notes

  1. If , then is closed at
  2. If , then is open at
  3. If is both open and closed at , then we say is continuous at .
  4. If is (closed, open, continuous) at every , then we we say is (closed, open, continuous) on .

Constant point-to-set map

Let be a point-to-set map. We say is constant if for all for some fixed, closed, subset .

Note

Intersection of point-to-set maps

The intersection of point-to-set-maps is defined as

Note

  • The intersection of closed point-to-set maps is necessarily closed
  • The intersection of open point-to-set maps is not necessarily open

IMPORTANT Let be a function and let denote the set of feasible when minimizing . Let denote the set of global minimizers of the optimization problem

\text{minimize} \quad& \sigma(x,y) \\ \text{subject to}\quad & y \in \Omega(x) \end{align}$$ Theorem 8 Hogan 1973 If $\Omega$ is continuous at the point $\bar{x}\in {\cal X}$ and $\sigma$ is continuous on the set $\bar{x} \times \Omega(\bar{x})$, then $\text{Min}(\bar{x})$ is closed. Translation: If $x_{k}\to \bar{x}$ defines a sequence of optimization problems and $y_{k}\to \bar{y}$ is a corresponding sequence of solutions, then a limit of solutions is a solution of the limiting problem. ie, $\bar{x}$ is solved by $\bar{y}$ ## Setting up the Problem Recall the definition of an EDM1 matrix: > [!def] EDM-1 > > A matrix $D \in \mathbb{R}^{n\times n}$ is a **Type-1 Euclidean Distance Matrix** (*EDM-1*) if and only if there exist $p \in \mathbb{N}$ and $z_{1},\dots,z_{n} \in \mathbb{R}^p$ such that > $D:=d_{ij}=\lvert \lvert z_{i}-z_{j} \rvert \rvert $$ > We call the smallest such $p$ the **embedding dimension** of $D$. > We can rewrite the embedding problem as the problem of minimizing $$\sigma_{n}(\Delta,D)=\frac{1}{n^2}\lvert \lvert D-\Delta \rvert \rvert _{F}^2$$ If we let $D$ vary, Here, ${\cal Y}_{n}$ is the closed cone of all $n\times n$ [[Euclidean Distance Matrix 1|EDM-1]] matrices of embedding dimension $\leq d$ - We see that each $D$ corresponds to an equivalence class of *configurations*. If we let the [[dissimilarity matrix]] $\Delta$ vary, And ${\cal X}_{n}$ is the closed cone of all $n\times n$ [[dissimilarity matrix|dissimilarity matrices]]. $\text{Min}(\cdot)$ is the [[point-to-set map]] defined by $$\text{Min}(\Delta)=\left\{ D \in {\cal Y}_{n}:\sigma_{n}(\Delta,D) \leq \inf_{D \in {\cal Y}_{n}} \sigma_{n}(\Delta,D) \right\}$$ where each $\text{Min}(\Delta)$ is the set of globally minimizing [[Euclidean Distance Matrix 1|EDM-1]] matrices for $\Delta$. --- ### Technical details Suppose that $D_{\infty}$ is an accumulation point of a sequence $\{ D_{k} \}_{k=1}^\infty$. Suppose that $\Delta_{k} \to \Delta_{\infty}$ such that $D_{k} \to D_{\infty}$ as $k\to \infty$. Then 1. For any $\Delta \in {\cal X}_{n}$, the function $\sigma_{n}$ is continuous on the set $\Delta \times {\cal Y}_{n}$ 2. The constraint map $\Omega$ defined by $\Omega(\Delta)={\cal Y}_{n}$ is a [[point-to-set map|constant point-to-set map]] (and therefore [[constant point-to-set maps are continuous|continuous]]) So by theorem 8 in Hogan 1973, $\text{Min}(\Delta)$ is closed. But since $\Delta$ is arbitrary, we have that $\text{Min}$ is a closed [[point-to-set map]] ie $$\Delta_{k}\to \Delta_{\infty}, \quad D_{k}\in \text{Min}(\Delta_{k}), \quad D_{k} \to D_{\infty}\implies D_{\infty}\in \text{Min}(\Delta_{\infty})$$ ie, a limit of the solutions is a solution of the limiting problem. ## Varying the number of objects [[point-to-set map|point-to-set maps]] require a *fixed* $\sigma:{\cal X}\times {\cal Y}\to \mathbb{R}$ with *fixed* ${\cal X}$ and ${\cal Y}$. We want to treat the embedding problems with $n=2,3,\dots$ as special cases of the problem $$\begin{align} \text{minimize} \quad& \sigma(x,y) \\ \text{subject to}\quad & y \in \Omega(x) \end{align}$$ To do this, we need to make several modifications to the setting 1. Replace the set of $n$ objects with a [[probability measure]] $P$ on [[compact]] [[metric space]] ${\cal M}$ 2. Replace the $n\times n$ [[dissimilarity matrix]] with a *dissimilarity function* $\Delta: {\cal M}\times {\cal M} \to \mathbb{R}$ 3. Replace the fixed configuration of points $z_{1},\dots,z_{n}\in \mathbb{R}^d$ with an *embedding function* $\text{mds}: {\cal M}\to \mathbb{R}^d$ And we define a *pseudometric* $D:{\cal M}\times {\cal M} \to \mathbb{R}$ as $$D(m_{1},m_{2})=\lvert \lvert \text{mds}(m_{1}) - \text{mds}(m_{2}) \rvert \rvert $$ Our parameters are then $x=(\Delta,P)$ and our decision variables are $y=D$. ### Technical Details: Parameters Let ${\cal M}$ be a compact metric space (eg, a Riemannian manifold compact in Riemannian metric topology). - [ ] What is Riemannian metric topology? #class_notes/clean ⏳ 2025-12-17 Let ${\cal B}$ be the [[borel sigma algebra]] generated by the open subsets of ${\cal M}$. #### Probability Measures Let ${\cal P}$ be the space of [[probability measure|probability measures]] $P$ on the measurable space $({\cal M}, {\cal B})$ - topologized by weak convergence Since ${\cal M}$ is [[separable]], the topology of weak convergence in ${\cal P}$ is equivalent to the topology of the Prohorov metric with respect to ${\cal P}$. This is a [[complete]] [[metric space]]. Further, since ${\cal M}$ is [[compact]], ${\cal P}$ is compact in the topology of weak convergence. #### Dissimilarity Function Let $\Delta:{\cal M}\times {\cal M}\to \mathbb{R}$ be a [[Borel measurable]] dissimilarity function on ${\cal M}$. Let ${\cal D}$ be the cone of all such $\Delta$. Note that ${\cal D}$ is closed in the topology of pointwise convergence *and* in the topology of $L^p$ convergence for $p \in [1, \infty)$. It is a complete [[metric space]] with respect to $L^p$. #### Convergence in Parameter Space (!) Let ${\cal X}={\cal D}\times {\cal P}$ be the cartesian product of ${\cal D}$ and ${\cal P}$. This is a complete metric space with respect to the product metric of the $L^p$ metric on ${\cal D}$ and the Prohorov metric on ${\cal P}$. ### Technical Details: Decision Variables ### Technical Details: Optimization Problem ## Prototype Experiment Suppose that $m_{1},m_{2},\dots \overset{\text{iid}}\sim P_{\infty}$. Let $\hat{P}_{n}$ be the empirical probability measure of $m_{1},\dots,m_{n}$ such that $\hat{P}_{n} \xrightarrow{w}P_{\infty}$ Let $\Delta_{n}$ be the uniformly bounded dissimilarity functions that converge pointwise to a dissimilarity function $\Delta_{\infty}$ as $n\to \infty$. Example $\Delta_{n}$ might be the shortest path distance on a localization graph constructed from $m_{1},\dots,m_{n}$ and $\Delta_{\infty}$ might be the Riemannian distance on a Riemannian manifold ${\cal M}$. Even if we know ${\cal M}$ and $(\Delta_{\infty},P_{\infty})$, we still have an infinite-dimensional problem $\sigma((\Delta_{\infty}, P_{\infty}), \cdot)$ (this is bad!) But the limiting problem tells us what it means to extract all the $d$-dimensional latent Euclidean structure in ${\cal M}$ If we can establish that the limit of solutions for the sequence of finite-dimensional problems $\sigma((\Delta_{n},\hat{P}_{n}), \cdot)$ is a solution for the limiting problem, then we have a consistency result! Thus, we study the behavior of $\text{Min}(\Delta_{n}, \hat{P}_{n})$ as $n\to \infty$. (the fact that ${\cal X}, {\cal Y}$ are now infinite-dimensional function spaces requires some care) ### Asymptotic Behavior Theorem 3 from CMDS Let $n=1,2,\dots$ and $p \in [1,\infty)$. Suppose that the sequence of dissimilarity functions $\{ \Delta_{n} \}$ is uniformly bounded and converges to the dissimilarity function $\Delta_{\infty}$ in the topology of pointwise convergence, Suppose that the sequence of empirical probability measures $\{ \hat{P}_{n} \}$ converges weakly to the probability measure $P_{\infty}$. Then 1. Any sequence of $D_{n}\in \text{Min}(\Delta_{n}\hat{P}_{n})$ has an accumulation point in the topology of $L^p(P_{\infty})$ convergence 2. If $D_{\infty}$ is an accumulation point of the sequence of solutions $\{ D_{n} \in \text{Min}(\Delta_{n}, \hat{P}_{n}) \}$ in the topology of $L^p(P_{\infty})$ convergence, then $D_{\infty} \in \text{Min}(\Delta_{\infty}, P_{\infty})$ For part (2), Tychonoff's theorem for pointwise convergence, then Lebesgue Dominated Convergence Theorem to get $L^p(P_{\infty})$ convergence. --- ## More Questions Is inverse distance weighting [[Lipschitz continuous|Lipschitz]]? Big takeaway: writing the embedding as a solution of an optimization problem Many problems Time series of networks? ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```