Random Matrix Lecture 04

[[lecture-data]]

Info

let current = dv.current();
let previous = current.previous_lecture;
 
// get notes with this as the previous
let pages = dv.pages("")
	.where(p => p.previous_lecture)
	.where(p => String(p.previous_lecture)
		.includes(dv.current().file.link))
 
let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `;
 
if (pages.length > 0) {
	output += pages
		.map(p => p.file.link).join(", ");
} else {
	output += 'No Next Lecture'
}
 
output += ' :LiArrowBigRightDash:'
 
dv.el('div', output)

Class Notes

Notes pages 15-20 under Singular Values: Geometric Method

 

Summary

  • Concentration of singular values for short/fat matrices
  • Matrix concentration inequality
  • Geometric method for random matrix analysis
  • Epsilon nets
    • Discretizing matrix norms
    • Existence of good nets

2. Rectangular Matrices

2.3 Spectral Analysis of Rectangular Gaussian Matrices

2.3.1 Singular Values: Geometric Method

Last time: Spectrum of random matrix

and especially

  • (not actually tru) from JL thm for a few quadratic forms

  • by LLN

    • this is actually true in the operator norm
\Delta &:= GG^T - mI_{d} \\ &= GG^T - EGG^T \end{align}$$ But recall the [[operator norm]]: $$\lvert \lvert \Delta \rvert \rvert = \sup_{x \in \mathbb{S}^{d-1}} \lvert x^T\Delta x \rvert = \sup_{x \in B^d} \lvert x^T\Delta x \rvert $$ $$B^d(x,r) := \{ y: \lvert \lvert x-y \rvert \rvert \leq r \}\subset \mathbb{R}^d$$ $$B^d := B^d(0, 1) \supset \mathbb{S}^{d-1}$$ > [!def] Restricted Inner Product > > Let ${\cal X} = [x_{1},\dots,x_{N}] \subseteq \mathbb{R}^d$ and let $\Delta \in M_{d}$ be [[hermitian|symmetric]]. Then we define > $$\lvert \lvert \Delta \rvert \rvert _{{\cal X}} := \sup_{x \in {\cal X}} \lvert x^T\Delta x \rvert = \langle \Delta,xx^T \rangle $$ > see [[restricted inner product]] This is a [[semi-norm]] - whether or not it is a [[norm]] depends on the set ${\cal X}$. +++ WHEN IS IT A NORM? To be a norm, outer products need to span the space of symmetric matrices > [!theorem] Proposition > > Let $M \in \mathbb{R}^{d\times d}$ be a [[hermitian|symmetric]] [[random matrix]] and let ${\cal X} \subset \mathbb{R}^d$ be finite. Then > $$\mathbb{P}[\lvert \lvert M \rvert \rvert _{{\cal X}} > t] \leq \lvert {\cal X} \rvert \cdot \max_{x \in {\cal X}} \mathbb{P}[\,\lvert x^TMx \rvert > t] $$ > There is a tradeoff between the factor $\lvert {\cal X} \rvert$ and the error in approximating the [[operator norm]] $\lvert \lvert \Delta \rvert \rvert$ with the [[restricted inner product]] $\lvert \lvert \Delta \rvert \rvert_{{\cal X}}$. - Thus the problem of accurate approximation becomes one of finding a small set ${\cal X}$ that behaves like a "grid" approximating $\mathbb{S}^{d-1}$ (or $B^d$) - We call this approach the *geometric method* because it reduces the problem of understanding [[norm|norms]] of [[random matrix|random matrices]] to a geometric problem instead ### 2.3.2 Nets, Coverings, and Packings #### Discretizing the operator norm We want to know if the [[restricted inner product]] approximates the [[operator norm]] > [!def] $\epsilon$-net > > Let ${\cal X} \subseteq {\cal Y} \subseteq \mathbb{R}^d$. ${\cal X}$ is an $\epsilon$-**covering** or an $\epsilon$-**net** of ${\cal Y}$ if > $$\begin{align} > \bigcup_{x \in {\cal X}} B^d(x, \epsilon) &\supseteq {\cal Y} \\ > &\iff \\ > \forall\,y \in {\cal Y}\,\exists \,x \in {\cal X} &\text{ s.t. } \lvert \lvert y-x \rvert \rvert \leq \epsilon > \end{align}$$ see [[epsilon net]] > [!theorem] Lemma > > If ${\cal X}$ is an [[epsilon net]] of $\mathbb{S}^{d-1}$ or $B^d$, then > $$\lvert \lvert \Delta \rvert \rvert _{{\cal X}} \leq \lvert \lvert \Delta \rvert \rvert \leq \frac{1}{1-2\epsilon} \lvert \lvert \Delta \rvert \rvert _{{\cal X}}$$ > (if $\epsilon \in \left( 0, \frac{1}{2} \right)$ for the second inequality) > [!proof] > > Let $x \in \mathbb{S}^{d-1}$ be such that $\lvert x^T\Delta x \rvert = \lvert \lvert \Delta \rvert \rvert$ and let $x_{i} \in {\cal X}$ such that $\lvert \lvert x-x_{i} \rvert \rvert \leq \epsilon$. Then > $$\begin{align} > \lvert \lvert \Delta \rvert \rvert _{{\cal X}} &\geq \lvert x_{i}^T\Delta x_{i} \rvert \\ > &= \lvert (x + (x_{i}-x))^T \Delta (x+(x_{i} - x)) \rvert \\ > &= \underbrace{\lvert x^T\Delta x \rvert}_{\lvert \lvert \Delta \rvert \rvert } - \underbrace{\lvert x^T\Delta(x_{i}-x) \rvert}_{\leq \lvert \lvert x \rvert \rvert \cdot \lvert \lvert \Delta \rvert \rvert \cdot \lvert \lvert x_{i}-x \rvert \rvert \leq \epsilon \lvert \lvert \Delta \rvert \rvert } -\underbrace{\lvert (x_{i}-x)^T \Delta x_{i} \rvert }_{\leq \epsilon \lvert \lvert \Delta \rvert \rvert } \\ > &\geq (1-2\epsilon) \lvert \lvert \Delta \rvert \rvert > \end{align}$$ > > $$\tag*{$\blacksquare$}$$ > > [!NOTE] > > ++++ interpretation of special $x$ see [[epsilon net restricted inner product bounds the operator norm]] We want to construct our [[epsilon net|epsilon nets]] so we can control their size. It is hard to manually describe nets in terms of their vectors, so we need some framework to define them otherwise. --- #### Non-constructive existence of nets > [!def] $\epsilon$-packing > > Let ${\cal X} \subset {\cal Y}$. ${\cal X}$ is an $\epsilon$-**packing** of ${\cal Y}$ if > $$B^d(x,\epsilon) \cap B^d(x', \epsilon) = \varnothing \quad \forall\, x\neq x' \in {\cal X}$$ > ie if any two of these balls are sufficiently far apart. ie, > $$\lvert \lvert x-x' \rvert \rvert >2\epsilon\quad \forall\,x\neq x'\in {\cal X}$$ > see [[epsilon packing]] > [!proof] Lemma > > If ${\cal X}$ is an [[epsilon packing]] of ${\cal Y}$ that is *maximal*, then ${\cal X}$ is a $2\epsilon$ [[epsilon net|net]]. > > > [!NOTE] Maximal > > We say that ${\cal X}$ is **maximal** if ${\cal X} \cup \{ x' \}$ is *not* a packing for all $x' \not\in {\cal X}$ > > [!proof] > > Let $y \in {\cal Y} \setminus {\cal X}$. Maximality means $B^d(y, \epsilon) \cap B^d(x, \epsilon) \neq \varnothing$ for some $x \in {\cal X}$. Let $z$ be in this intersection > > Then by the [[triangle inequality]] and by construction of our balls above, we get > $$\lvert \lvert y-x \rvert \rvert \leq \lvert \lvert y-z \rvert \rvert +\lvert \lvert z-x \rvert \rvert \leq 2\epsilon$$ > $$\tag*{$\blacksquare$}$$ see [[maximal epsilon packing is also a net]] We want to show that small [[epsilon net|nets]] exist. We have a volumetric bound on the size of packings (including maximal ones), and as we saw above, this means there is a corresponding net. So we have a plan: 1. Show all [[epsilon packing|epsilon packings]] are not "too large" 2. Apply to a maximal [[epsilon packing]] 3. This means that there exists a $2\epsilon$ [[epsilon net|net]] that is not too large > [!theorem] Proposition > > Suppose ${\cal X}$ is an [[epsilon packing]] of $B^d$. Then $\lvert {\cal X} \rvert \leq (1+\frac{1}{\epsilon})^d$ > > [!NOTE] > > The true value is is $\exp\left( {\cal O}\left( \frac{d}{\epsilon} \right) \right)$ > [!proof] > > Since ${\cal X}$ is a packing of $B^d$, we know > $$\begin{align} > \bigcup_{x \in {\cal X}} B^d(x, \varepsilon) &\subseteq B^d(0, 1+\varepsilon) \\ > \implies \lvert {\cal X} \rvert \cdot \text{vol}(B^d(0, \varepsilon)) &\leq \text{vol}(B^d(0, 1+\varepsilon)) \\ > \implies \lvert {\cal X} \rvert &\leq \frac{\text{vol}(B^d(0, 1+\varepsilon))}{\text{vol}(B^d(0, \varepsilon))} \\ > &=\left( \frac{1+\varepsilon}{\varepsilon} \right)^d > \end{align}$$ > Note in the first line the LHS union is disjoint, which is how we get the second line. > $$\tag*{$\blacksquare$}$$ > see [[bound for the size of an epsilon packing]] > [!theorem] Corollary > > For all $\varepsilon > 0$, there exists an [[epsilon net]] ${\cal X}$ of $B^d$ with $\lvert {\cal X} \rvert \leq \left( 1+\frac{2}{\epsilon} \right)^d$ > [!proof] > > Define ${\cal X} :=$ any maximal $\frac{\varepsilon}{2}$ [[epsilon packing|packing]]. The result follows from [[maximal epsilon packing is also a net]] and the size bound follows immediately from the [[bound for the size of an epsilon packing]] see [[we can find an epsilon net with the bound for an epsilon packing]] ### 2.3.3 Coarse Non-Asymptotic Bound on Singular Values Putting everything together, we can get a bound on $\lvert \lvert \Delta \rvert \rvert$. - Using the existence not-too-big [[epsilon net]], we can approximate the norm by discretizing and test vectors in the net. - We can then use the first moment method (Taylor) to help get our bound > [!theorem] > > For$d<m$, let $G \sim N(0, 1)^{\otimes d\times m}$. Define $\Delta:= GG^T - mI_{d}$. Then > $$\mathbb{P}[\lvert \lvert \Delta \rvert \rvert \geq 64\sqrt{ dm }] \leq 2\exp(-d)$$ >Note: $\lvert \lvert G \rvert \rvert ^2=\lvert \lvert GG^T \rvert \rvert \leq m+{\cal O}(\sqrt{ dm })$ > [!proof] > > Let $\varepsilon := \frac{1}{4}$. Then we have: > 1. There exists an [[epsilon net]] ${\cal X}$ of $B^d$ where we our [[bound for the size of an epsilon packing]] yields > $$\lvert{\cal X}\rvert \leq \left( 1+\frac{2}{\varepsilon} \right)^d = \left( 1 + \frac{2}{\frac{1}{4}} \right)^d=9^d$$ > 2. We also have the bound (from [[epsilon net restricted inner product bounds the operator norm]]) > $$\lvert \lvert \Delta \rvert \rvert \leq \frac{1}{1-2\varepsilon} \lvert \lvert \Delta \rvert \rvert_{{\cal X}}=\frac{1}{1-2\left( \frac{1}{4} \right)}\lvert \lvert \Delta \rvert \rvert_{{\cal X}} =2 \lvert \lvert \Delta \rvert \rvert_{{\cal X}}$$ > > So > $$\begin{align} > \mathbb{P}[\lvert \lvert \Delta \rvert \rvert \geq t] &\leq \mathbb{P}\left[ \lvert \lvert \Delta \rvert \rvert _{{\cal X}} \geq \frac{t}{2} \right] \\ > \text{union bound} \implies &\leq \sum_{x \in{\cal X}} \mathbb{P}\left[ \underbrace{\lvert x^T\Delta x \rvert}_{GG^T - mI_{d}} \geq \frac{t}{2} \right] \\ \end{align}$$ > Since $x \in {\cal X} \subseteq B^d$, we know $\lvert \lvert x \rvert \rvert \leq 1$. So let $\hat{x}:= \frac{x}{\lvert \lvert x \rvert \rvert} \in \mathbb{S}^{d-1}$. Then > $$\begin{align} > \implies&\leq \sum_{x \in {\cal X}} \mathbb{P}\left[ \lvert \hat{x}^T \Delta \hat{x} \rvert \geq \frac{t}{2} \right] \\ > &= \sum_{x \in {\cal X}} \mathbb{P}_{G}\left[ \lvert\,\lvert \lvert \underbrace{G^T\hat{x}}_{\in \mathbb{R}^m,\, \text{Law}=N(0,I_{m})} \rvert \rvert^2 - m \,\rvert \geq \frac{t}{2} \right] \\ \end{align}$$ >Now, let $g=G^T\hat{x}$ > > $$\begin{align} > \implies &= \lvert {\cal X} \rvert \cdot \mathbb{P}_{g \sim N(0, I_{m})}\left[ \lvert\,\lvert \lvert g \rvert \rvert ^2 -m \,\rvert \geq \frac{t}{2} \right] \\ > &\leq 9^d\cdot 2\exp\left( -\frac{1}{8}\min\left\{ \frac{t^2}{4m}, \frac{t}{2} \right\} \right) \\ > &\leq 2\exp\left( 3d - \frac{1}{8} \min\left\{ \frac{C^2}{4}d, \underbrace{\frac{C}{2}\sqrt{ dm }}_{d\leq m \implies \sqrt{ dm } \geq d} \right\} \right) \end{align}$$ > And setting $t:= C\sqrt{ dm }$ and noting that $9 \leq \exp(3)$, we get > $$\begin{align} > \implies &\leq 2\exp\left( d\left[ 3- \underbrace{\min\left\{ \frac{C^2}{32}, \frac{C}{16} \right\}}_{C=64\to \min(>4, 4)} \right] \right) \\ > (*) C:= 64 \to\quad&\leq 2 \exp(-d) > \end{align}$$ > > For $(*)$, we want to pick $C$ big enough that everything is negative. > $$\tag*{$\blacksquare$}$$ > > [!NOTE] Union Bound > > the union bound is [[outer measure has countable subadditivity|countable subadditivity]] or the fact that > $$\mathbb{P}\left( \bigcup_{i}A_{i} \right) \leq \sum_{i} \mathbb{P}(A_{i})$$ see [[high probability bound for operator norm of difference for Gaussian covariance matrix]] From [[Weyl's Theorem]], we can get the following result: > [!theorem] Corollary (special case) > > Let $A, B \in M_{d}$ be [[hermitian|symmetric]]. Then > $$\lambda_{d}(A) - \lvert \lvert B \rvert \rvert \leq \lambda_{d}(A+B) \leq \lambda_{1}(A+B) \leq \lambda_{1}(A) + \lvert \lvert B \rvert \rvert $$ > Where $\lvert \lvert \cdot \rvert \rvert$ is the [[operator norm]]. > [!proof] > > The proof is similar to the one proving Weyl, except we can use the variational description of the [[eigenvalue|eigenvalues]] instead of [[Courant-Fisher Theorem|Courant-Fisher]]. For the first bound, we have > $$\begin{align} > \lambda_{1}(A + B) &= \sup_{x \in \mathbb{S}^{d-1}} x^T(A+B)x \\ > &\leq \sup_{x \in \mathbb{S}^{d-1}} (x^T A x + \lvert \lvert B \rvert \rvert ) \\ > &=\sup_{x \in \mathbb{S}^{d-1}} x^TAx + \lvert \lvert B \rvert \rvert \\ > &= \lambda_{1}(A) + \lvert \lvert B \rvert \rvert > \end{align}$$ > (by just distributing the $\sup$ and by definition of the [[operator norm]]). And, similarly, we have > $$\begin{align} > \lambda_{d}(A + B) &= \inf_{x \in \mathbb{S}^{d-1}} x^T(A+B)x \\ > &\geq \inf_{x \in \mathbb{S}^{d-1}}(x^TAx - \lvert \lvert B \rvert \rvert ) \\ > &= (\inf_{x \in \mathbb{S}^{d-1}} x^TAx) - \lvert \lvert B \rvert \rvert \\ > &= \lambda_{d}(A) - \lvert \lvert B \rvert \rvert > \end{align}$$ > > > [!theorem] Corollary > > Let $G\in\mathbb{R}^{d\times m}$ be a Gaussian [[random matrix]] with $d \leq m$. Then > $$\mathbb{P}[\sqrt{ m }-64\sqrt{ d } \leq \sigma_{d}(G) \leq\dots \leq \sigma_{1}(G) \leq \sqrt{ m } + 32\sqrt{ d }] > 1-2\exp(-d)$$ > > > [!important] > > ie all singular values are {==$\sqrt{ m } \pm {\cal O}(\sqrt{ d })$==} > <!--SR:!2026-04-26,138,250--> > [!NOTE] > > This is a good event; we want this to happen. And luckily, we have high probability for this event. > [!proof] > > > > [!note] Recall > > First-order Taylor approximation (and concavity of $\sqrt{ \cdot }$) yields > > $$\begin{align} > > \sqrt{ 1+x } &\leq 1+\frac{x}{2}\quad &\text{for } x \geq 0 \\ > > \sqrt{ 1-x } &\geq 1-x \quad &\text{for } 0\leq x\leq 1 > > \end{align}$$ > > > > If the event for our [[high probability bound for operator norm of difference for Gaussian covariance matrix]] happens, then we have > $$\lvert \lvert \Delta \rvert \rvert = \lvert \lvert GG^T -mI_{d} \rvert \rvert \leq 64 \sqrt{ dm }$$ > > ie if the event happens, we have > (1) > $$\begin{align} > \sigma_{1}(G) = \lvert \lvert G \rvert \rvert &= \sqrt{ \lvert \lvert GG^T \rvert \rvert } \\ > (*)&\leq \sqrt{ \lvert \lvert mI_{d} \rvert \rvert + \lvert \lvert \Delta \rvert \rvert } \\ > &\leq \sqrt{ m + 64\sqrt{ dm } } \\ > &= \sqrt{ m }\sqrt{ 1 + 64\sqrt{ \frac{d}{m} } } \\ > (**)&\leq\sqrt{ m }\left( 1+\frac{64}{2}\sqrt{ \frac{d}{m} } \right) \\ > &= \sqrt{ m } + 32\sqrt{ d } > \end{align}$$ > Where > - $(*)$ is from our [[Weyl's Theorem#special-case-for-random-matrix|special case of Weyl]] > - $(**)$ is from the first of our two bounds above > > Then, for $\sigma_{d}(G)$, we have a similar bound. > > Note that if $m-64\sqrt{ dm } <0$, then $\sqrt{ m }-64\sqrt{ d }<0$ and the result follows. So suppose $m-64\sqrt{ dm } \geq 0$. > > > > (2) > $$\begin{align} > \sigma_{d}(G) &= \sqrt{ \lambda_{d}(GG^T) } \\ > (*)&\geq \sqrt{ \lambda_{d}(mI_{d}) - \lvert \lvert \Delta \rvert \rvert } \\ > &\geq \sqrt{ m-64\sqrt{ dm } } \\ > &= \sqrt{ m }\sqrt{ 1-64\sqrt{ \frac{d}{m} } } \\ > (* *)&\geq \sqrt{ m }\left( 1-64 \sqrt{ \frac{d}{m} } \right) \\ > &= \sqrt{ m } - 64\sqrt{ d } > \end{align}$$ > Where > - $(*)$ is from our [[Weyl's Theorem#special-case-for-random-matrix|special case of Weyl]] > - $(* *)$ is from the second of our two bounds above > $$\tag*{$\blacksquare$}$$ see [[high probability bound on singular values of gaussian random matrix]] $G^TG \in \mathbb{R}^{m\times m}$ eigenvalues of $\frac{1}{m}G^TG$ for $d\ll m$ we know $\lambda_{1}\geq\dots \geq \lambda_{d} \geq \lambda_{d+1} \geq\dots \geq \lambda_{m} \geq 0$ $\lambda_{1} \approx \dots \approx \lambda_{d} \approx1$ $\lambda_{d+1} \approx\dots \approx\lambda_{m} \approx 0$ approximately a random projection matrix $\frac{1}{m} G^TG$ Next time: - this method of proof - singular vectors - what is this fluctuation around the eigenvalues? # Review #flashcards/math/rmt ## TODO - [x] Clean up lecture ⏳ 2025-09-05 ✅ 2025-09-05 - [x] Finish linking ⏳ 2025-09-05 ✅ 2025-09-05 - [-] Add flashcards #class_notes/clean ⏳ 2025-09-06 ❌ 2025-12-08 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```