let current = dv.current();let previous = current.previous_lecture;// get notes with this as the previouslet pages = dv.pages("") .where(p => p.previous_lecture) .where(p => String(p.previous_lecture) .includes(dv.current().file.link))let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `;if (pages.length > 0) { output += pages .map(p => p.file.link).join(", ");} else { output += 'No Next Lecture'}output += ' :LiArrowBigRightDash:'dv.el('div', output)
Class Notes
Notes pages 15-20 under Singular Values: Geometric Method
Summary
Concentration of singular values for short/fat matrices
Matrix concentration inequality
Geometric method for random matrix analysis
Epsilon nets
Discretizing matrix norms
Existence of good nets
2. Rectangular Matrices
2.3 Spectral Analysis of Rectangular Gaussian Matrices
2.3.1 Singular Values: Geometric Method
Last time: Spectrum of random matrix
G∼N(0,1)⊗d×m,d≤m
and especially d≪m
GTG≈dIm (not actually tru)
from JL thm for a few quadratic forms
GGT≈mId by LLN
d×d
this is actually true in the operator norm
\Delta &:= GG^T - mI_{d} \\
&= GG^T - EGG^T
\end{align}$$
But recall the [[operator norm]]:
$$\lvert \lvert \Delta \rvert \rvert = \sup_{x \in \mathbb{S}^{d-1}} \lvert x^T\Delta x \rvert = \sup_{x \in B^d} \lvert x^T\Delta x \rvert $$
$$B^d(x,r) := \{ y: \lvert \lvert x-y \rvert \rvert \leq r \}\subset \mathbb{R}^d$$
$$B^d := B^d(0, 1) \supset \mathbb{S}^{d-1}$$
> [!def] Restricted Inner Product
>
> Let ${\cal X} = [x_{1},\dots,x_{N}] \subseteq \mathbb{R}^d$ and let $\Delta \in M_{d}$ be [[hermitian|symmetric]]. Then we define
> $$\lvert \lvert \Delta \rvert \rvert _{{\cal X}} := \sup_{x \in {\cal X}} \lvert x^T\Delta x \rvert = \langle \Delta,xx^T \rangle $$
>
see [[restricted inner product]]
This is a [[semi-norm]] - whether or not it is a [[norm]] depends on the set ${\cal X}$.
+++ WHEN IS IT A NORM?
To be a norm, outer products need to span the space of symmetric matrices
> [!theorem] Proposition
>
> Let $M \in \mathbb{R}^{d\times d}$ be a [[hermitian|symmetric]] [[random matrix]] and let ${\cal X} \subset \mathbb{R}^d$ be finite. Then
> $$\mathbb{P}[\lvert \lvert M \rvert \rvert _{{\cal X}} > t] \leq \lvert {\cal X} \rvert \cdot \max_{x \in {\cal X}} \mathbb{P}[\,\lvert x^TMx \rvert > t] $$
>
There is a tradeoff between the factor $\lvert {\cal X} \rvert$ and the error in approximating the [[operator norm]] $\lvert \lvert \Delta \rvert \rvert$ with the [[restricted inner product]] $\lvert \lvert \Delta \rvert \rvert_{{\cal X}}$.
- Thus the problem of accurate approximation becomes one of finding a small set ${\cal X}$ that behaves like a "grid" approximating $\mathbb{S}^{d-1}$ (or $B^d$)
- We call this approach the *geometric method* because it reduces the problem of understanding [[norm|norms]] of [[random matrix|random matrices]] to a geometric problem instead
### 2.3.2 Nets, Coverings, and Packings
#### Discretizing the operator norm
We want to know if the [[restricted inner product]] approximates the [[operator norm]]
> [!def] $\epsilon$-net
>
> Let ${\cal X} \subseteq {\cal Y} \subseteq \mathbb{R}^d$. ${\cal X}$ is an $\epsilon$-**covering** or an $\epsilon$-**net** of ${\cal Y}$ if
> $$\begin{align}
> \bigcup_{x \in {\cal X}} B^d(x, \epsilon) &\supseteq {\cal Y} \\
> &\iff \\
> \forall\,y \in {\cal Y}\,\exists \,x \in {\cal X} &\text{ s.t. } \lvert \lvert y-x \rvert \rvert \leq \epsilon
> \end{align}$$
see [[epsilon net]]
> [!theorem] Lemma
>
> If ${\cal X}$ is an [[epsilon net]] of $\mathbb{S}^{d-1}$ or $B^d$, then
> $$\lvert \lvert \Delta \rvert \rvert _{{\cal X}} \leq \lvert \lvert \Delta \rvert \rvert \leq \frac{1}{1-2\epsilon} \lvert \lvert \Delta \rvert \rvert _{{\cal X}}$$
> (if $\epsilon \in \left( 0, \frac{1}{2} \right)$ for the second inequality)
> [!proof]
>
> Let $x \in \mathbb{S}^{d-1}$ be such that $\lvert x^T\Delta x \rvert = \lvert \lvert \Delta \rvert \rvert$ and let $x_{i} \in {\cal X}$ such that $\lvert \lvert x-x_{i} \rvert \rvert \leq \epsilon$. Then
> $$\begin{align}
> \lvert \lvert \Delta \rvert \rvert _{{\cal X}} &\geq \lvert x_{i}^T\Delta x_{i} \rvert \\
> &= \lvert (x + (x_{i}-x))^T \Delta (x+(x_{i} - x)) \rvert \\
> &= \underbrace{\lvert x^T\Delta x \rvert}_{\lvert \lvert \Delta \rvert \rvert } - \underbrace{\lvert x^T\Delta(x_{i}-x) \rvert}_{\leq \lvert \lvert x \rvert \rvert \cdot \lvert \lvert \Delta \rvert \rvert \cdot \lvert \lvert x_{i}-x \rvert \rvert \leq \epsilon \lvert \lvert \Delta \rvert \rvert } -\underbrace{\lvert (x_{i}-x)^T \Delta x_{i} \rvert }_{\leq \epsilon \lvert \lvert \Delta \rvert \rvert } \\
> &\geq (1-2\epsilon) \lvert \lvert \Delta \rvert \rvert
> \end{align}$$
>
> $$\tag*{$\blacksquare$}$$
>
> [!NOTE]
>
> ++++ interpretation of special $x$
see [[epsilon net restricted inner product bounds the operator norm]]
We want to construct our [[epsilon net|epsilon nets]] so we can control their size. It is hard to manually describe nets in terms of their vectors, so we need some framework to define them otherwise.
---
#### Non-constructive existence of nets
> [!def] $\epsilon$-packing
>
> Let ${\cal X} \subset {\cal Y}$. ${\cal X}$ is an $\epsilon$-**packing** of ${\cal Y}$ if
> $$B^d(x,\epsilon) \cap B^d(x', \epsilon) = \varnothing \quad \forall\, x\neq x' \in {\cal X}$$
> ie if any two of these balls are sufficiently far apart. ie,
> $$\lvert \lvert x-x' \rvert \rvert >2\epsilon\quad \forall\,x\neq x'\in {\cal X}$$
>
see [[epsilon packing]]
> [!proof] Lemma
>
> If ${\cal X}$ is an [[epsilon packing]] of ${\cal Y}$ that is *maximal*, then ${\cal X}$ is a $2\epsilon$ [[epsilon net|net]].
>
> > [!NOTE] Maximal
> > We say that ${\cal X}$ is **maximal** if ${\cal X} \cup \{ x' \}$ is *not* a packing for all $x' \not\in {\cal X}$
>
> [!proof]
>
> Let $y \in {\cal Y} \setminus {\cal X}$. Maximality means $B^d(y, \epsilon) \cap B^d(x, \epsilon) \neq \varnothing$ for some $x \in {\cal X}$. Let $z$ be in this intersection
>
> Then by the [[triangle inequality]] and by construction of our balls above, we get
> $$\lvert \lvert y-x \rvert \rvert \leq \lvert \lvert y-z \rvert \rvert +\lvert \lvert z-x \rvert \rvert \leq 2\epsilon$$
> $$\tag*{$\blacksquare$}$$
see [[maximal epsilon packing is also a net]]
We want to show that small [[epsilon net|nets]] exist. We have a volumetric bound on the size of packings (including maximal ones), and as we saw above, this means there is a corresponding net. So we have a plan:
1. Show all [[epsilon packing|epsilon packings]] are not "too large"
2. Apply to a maximal [[epsilon packing]]
3. This means that there exists a $2\epsilon$ [[epsilon net|net]] that is not too large
> [!theorem] Proposition
>
> Suppose ${\cal X}$ is an [[epsilon packing]] of $B^d$. Then $\lvert {\cal X} \rvert \leq (1+\frac{1}{\epsilon})^d$
>
> [!NOTE]
>
> The true value is is $\exp\left( {\cal O}\left( \frac{d}{\epsilon} \right) \right)$
> [!proof]
>
> Since ${\cal X}$ is a packing of $B^d$, we know
> $$\begin{align}
> \bigcup_{x \in {\cal X}} B^d(x, \varepsilon) &\subseteq B^d(0, 1+\varepsilon) \\
> \implies \lvert {\cal X} \rvert \cdot \text{vol}(B^d(0, \varepsilon)) &\leq \text{vol}(B^d(0, 1+\varepsilon)) \\
> \implies \lvert {\cal X} \rvert &\leq \frac{\text{vol}(B^d(0, 1+\varepsilon))}{\text{vol}(B^d(0, \varepsilon))} \\
> &=\left( \frac{1+\varepsilon}{\varepsilon} \right)^d
> \end{align}$$
> Note in the first line the LHS union is disjoint, which is how we get the second line.
> $$\tag*{$\blacksquare$}$$
>
see [[bound for the size of an epsilon packing]]
> [!theorem] Corollary
>
> For all $\varepsilon > 0$, there exists an [[epsilon net]] ${\cal X}$ of $B^d$ with $\lvert {\cal X} \rvert \leq \left( 1+\frac{2}{\epsilon} \right)^d$
> [!proof]
>
> Define ${\cal X} :=$ any maximal $\frac{\varepsilon}{2}$ [[epsilon packing|packing]]. The result follows from [[maximal epsilon packing is also a net]] and the size bound follows immediately from the [[bound for the size of an epsilon packing]]
see [[we can find an epsilon net with the bound for an epsilon packing]]
### 2.3.3 Coarse Non-Asymptotic Bound on Singular Values
Putting everything together, we can get a bound on $\lvert \lvert \Delta \rvert \rvert$.
- Using the existence not-too-big [[epsilon net]], we can approximate the norm by discretizing and test vectors in the net.
- We can then use the first moment method (Taylor) to help get our bound
> [!theorem]
>
> For$d<m$, let $G \sim N(0, 1)^{\otimes d\times m}$. Define $\Delta:= GG^T - mI_{d}$. Then
> $$\mathbb{P}[\lvert \lvert \Delta \rvert \rvert \geq 64\sqrt{ dm }] \leq 2\exp(-d)$$
>Note: $\lvert \lvert G \rvert \rvert ^2=\lvert \lvert GG^T \rvert \rvert \leq m+{\cal O}(\sqrt{ dm })$
> [!proof]
>
> Let $\varepsilon := \frac{1}{4}$. Then we have:
> 1. There exists an [[epsilon net]] ${\cal X}$ of $B^d$ where we our [[bound for the size of an epsilon packing]] yields
> $$\lvert{\cal X}\rvert \leq \left( 1+\frac{2}{\varepsilon} \right)^d = \left( 1 + \frac{2}{\frac{1}{4}} \right)^d=9^d$$
> 2. We also have the bound (from [[epsilon net restricted inner product bounds the operator norm]])
> $$\lvert \lvert \Delta \rvert \rvert \leq \frac{1}{1-2\varepsilon} \lvert \lvert \Delta \rvert \rvert_{{\cal X}}=\frac{1}{1-2\left( \frac{1}{4} \right)}\lvert \lvert \Delta \rvert \rvert_{{\cal X}} =2 \lvert \lvert \Delta \rvert \rvert_{{\cal X}}$$
>
> So
> $$\begin{align}
> \mathbb{P}[\lvert \lvert \Delta \rvert \rvert \geq t] &\leq \mathbb{P}\left[ \lvert \lvert \Delta \rvert \rvert _{{\cal X}} \geq \frac{t}{2} \right] \\
> \text{union bound} \implies &\leq \sum_{x \in{\cal X}} \mathbb{P}\left[ \underbrace{\lvert x^T\Delta x \rvert}_{GG^T - mI_{d}} \geq \frac{t}{2} \right] \\ \end{align}$$
> Since $x \in {\cal X} \subseteq B^d$, we know $\lvert \lvert x \rvert \rvert \leq 1$. So let $\hat{x}:= \frac{x}{\lvert \lvert x \rvert \rvert} \in \mathbb{S}^{d-1}$. Then
> $$\begin{align}
> \implies&\leq \sum_{x \in {\cal X}} \mathbb{P}\left[ \lvert \hat{x}^T \Delta \hat{x} \rvert \geq \frac{t}{2} \right] \\
> &= \sum_{x \in {\cal X}} \mathbb{P}_{G}\left[ \lvert\,\lvert \lvert \underbrace{G^T\hat{x}}_{\in \mathbb{R}^m,\, \text{Law}=N(0,I_{m})} \rvert \rvert^2 - m \,\rvert \geq \frac{t}{2} \right] \\
\end{align}$$
>Now, let $g=G^T\hat{x}$
>
> $$\begin{align}
> \implies &= \lvert {\cal X} \rvert \cdot \mathbb{P}_{g \sim N(0, I_{m})}\left[ \lvert\,\lvert \lvert g \rvert \rvert ^2 -m \,\rvert \geq \frac{t}{2} \right] \\
> &\leq 9^d\cdot 2\exp\left( -\frac{1}{8}\min\left\{ \frac{t^2}{4m}, \frac{t}{2} \right\} \right) \\
> &\leq 2\exp\left( 3d - \frac{1}{8} \min\left\{ \frac{C^2}{4}d, \underbrace{\frac{C}{2}\sqrt{ dm }}_{d\leq m \implies \sqrt{ dm } \geq d} \right\} \right) \end{align}$$
> And setting $t:= C\sqrt{ dm }$ and noting that $9 \leq \exp(3)$, we get
> $$\begin{align}
> \implies &\leq 2\exp\left( d\left[ 3- \underbrace{\min\left\{ \frac{C^2}{32}, \frac{C}{16} \right\}}_{C=64\to \min(>4, 4)} \right] \right) \\
> (*) C:= 64 \to\quad&\leq 2 \exp(-d)
> \end{align}$$
>
> For $(*)$, we want to pick $C$ big enough that everything is negative.
> $$\tag*{$\blacksquare$}$$
>
> [!NOTE] Union Bound
>
> the union bound is [[outer measure has countable subadditivity|countable subadditivity]] or the fact that
> $$\mathbb{P}\left( \bigcup_{i}A_{i} \right) \leq \sum_{i} \mathbb{P}(A_{i})$$
see [[high probability bound for operator norm of difference for Gaussian covariance matrix]]
From [[Weyl's Theorem]], we can get the following result:
> [!theorem] Corollary (special case)
>
> Let $A, B \in M_{d}$ be [[hermitian|symmetric]]. Then
> $$\lambda_{d}(A) - \lvert \lvert B \rvert \rvert \leq \lambda_{d}(A+B) \leq \lambda_{1}(A+B) \leq \lambda_{1}(A) + \lvert \lvert B \rvert \rvert $$
> Where $\lvert \lvert \cdot \rvert \rvert$ is the [[operator norm]].
> [!proof]
>
> The proof is similar to the one proving Weyl, except we can use the variational description of the [[eigenvalue|eigenvalues]] instead of [[Courant-Fisher Theorem|Courant-Fisher]]. For the first bound, we have
> $$\begin{align}
> \lambda_{1}(A + B) &= \sup_{x \in \mathbb{S}^{d-1}} x^T(A+B)x \\
> &\leq \sup_{x \in \mathbb{S}^{d-1}} (x^T A x + \lvert \lvert B \rvert \rvert ) \\
> &=\sup_{x \in \mathbb{S}^{d-1}} x^TAx + \lvert \lvert B \rvert \rvert \\
> &= \lambda_{1}(A) + \lvert \lvert B \rvert \rvert
> \end{align}$$
> (by just distributing the $\sup$ and by definition of the [[operator norm]]). And, similarly, we have
> $$\begin{align}
> \lambda_{d}(A + B) &= \inf_{x \in \mathbb{S}^{d-1}} x^T(A+B)x \\
> &\geq \inf_{x \in \mathbb{S}^{d-1}}(x^TAx - \lvert \lvert B \rvert \rvert ) \\
> &= (\inf_{x \in \mathbb{S}^{d-1}} x^TAx) - \lvert \lvert B \rvert \rvert \\
> &= \lambda_{d}(A) - \lvert \lvert B \rvert \rvert
> \end{align}$$
>
>
> [!theorem] Corollary
>
> Let $G\in\mathbb{R}^{d\times m}$ be a Gaussian [[random matrix]] with $d \leq m$. Then
> $$\mathbb{P}[\sqrt{ m }-64\sqrt{ d } \leq \sigma_{d}(G) \leq\dots \leq \sigma_{1}(G) \leq \sqrt{ m } + 32\sqrt{ d }] > 1-2\exp(-d)$$
>
> > [!important]
> > ie all singular values are {==$\sqrt{ m } \pm {\cal O}(\sqrt{ d })$==}
> <!--SR:!2026-04-26,138,250-->
> [!NOTE]
>
> This is a good event; we want this to happen. And luckily, we have high probability for this event.
> [!proof]
>
>
> > [!note] Recall
> > First-order Taylor approximation (and concavity of $\sqrt{ \cdot }$) yields
> > $$\begin{align}
> > \sqrt{ 1+x } &\leq 1+\frac{x}{2}\quad &\text{for } x \geq 0 \\
> > \sqrt{ 1-x } &\geq 1-x \quad &\text{for } 0\leq x\leq 1
> > \end{align}$$
> >
>
> If the event for our [[high probability bound for operator norm of difference for Gaussian covariance matrix]] happens, then we have
> $$\lvert \lvert \Delta \rvert \rvert = \lvert \lvert GG^T -mI_{d} \rvert \rvert \leq 64 \sqrt{ dm }$$
>
> ie if the event happens, we have
> (1)
> $$\begin{align}
> \sigma_{1}(G) = \lvert \lvert G \rvert \rvert &= \sqrt{ \lvert \lvert GG^T \rvert \rvert } \\
> (*)&\leq \sqrt{ \lvert \lvert mI_{d} \rvert \rvert + \lvert \lvert \Delta \rvert \rvert } \\
> &\leq \sqrt{ m + 64\sqrt{ dm } } \\
> &= \sqrt{ m }\sqrt{ 1 + 64\sqrt{ \frac{d}{m} } } \\
> (**)&\leq\sqrt{ m }\left( 1+\frac{64}{2}\sqrt{ \frac{d}{m} } \right) \\
> &= \sqrt{ m } + 32\sqrt{ d }
> \end{align}$$
> Where
> - $(*)$ is from our [[Weyl's Theorem#special-case-for-random-matrix|special case of Weyl]]
> - $(**)$ is from the first of our two bounds above
>
> Then, for $\sigma_{d}(G)$, we have a similar bound.
>
> Note that if $m-64\sqrt{ dm } <0$, then $\sqrt{ m }-64\sqrt{ d }<0$ and the result follows. So suppose $m-64\sqrt{ dm } \geq 0$.
>
>
>
> (2)
> $$\begin{align}
> \sigma_{d}(G) &= \sqrt{ \lambda_{d}(GG^T) } \\
> (*)&\geq \sqrt{ \lambda_{d}(mI_{d}) - \lvert \lvert \Delta \rvert \rvert } \\
> &\geq \sqrt{ m-64\sqrt{ dm } } \\
> &= \sqrt{ m }\sqrt{ 1-64\sqrt{ \frac{d}{m} } } \\
> (* *)&\geq \sqrt{ m }\left( 1-64 \sqrt{ \frac{d}{m} } \right) \\
> &= \sqrt{ m } - 64\sqrt{ d }
> \end{align}$$
> Where
> - $(*)$ is from our [[Weyl's Theorem#special-case-for-random-matrix|special case of Weyl]]
> - $(* *)$ is from the second of our two bounds above
> $$\tag*{$\blacksquare$}$$
see [[high probability bound on singular values of gaussian random matrix]]
$G^TG \in \mathbb{R}^{m\times m}$
eigenvalues of $\frac{1}{m}G^TG$ for $d\ll m$
we know
$\lambda_{1}\geq\dots \geq \lambda_{d} \geq \lambda_{d+1} \geq\dots \geq \lambda_{m} \geq 0$
$\lambda_{1} \approx \dots \approx \lambda_{d} \approx1$
$\lambda_{d+1} \approx\dots \approx\lambda_{m} \approx 0$
approximately a random projection matrix $\frac{1}{m} G^TG$
Next time:
- this method of proof
- singular vectors
- what is this fluctuation around the eigenvalues?
# Review
#flashcards/math/rmt
## TODO
- [x] Clean up lecture ⏳ 2025-09-05 ✅ 2025-09-05
- [x] Finish linking ⏳ 2025-09-05 ✅ 2025-09-05
- [-] Add flashcards #class_notes/clean ⏳ 2025-09-06 ❌ 2025-12-08
```datacorejsx
const { dateTime } = await cJS()
return function View() {
const file = dc.useCurrentFile();
return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p>
}
```