Random Matrix Lecture 04

[[lecture-data]]

[!info]

Evaluation Error: SyntaxError: Unexpected token '>'

at DataviewInlineApi.eval (plugin:dataview:19027:21)

at evalInContext (plugin:dataview:19028:7)

at asyncEvalInContext (plugin:dataview:19038:32)

at DataviewJSRenderer.render (plugin:dataview:19064:19)

at DataviewJSRenderer.onload (plugin:dataview:18606:14)

at DataviewJSRenderer.load (app://obsidian.md/app.js:1:1182416)

at DataviewApi.executeJs (plugin:dataview:19607:18)

at DataviewCompiler.eval (plugin:digitalgarden:10763:23)

at Generator.next (<anonymous>)

at eval (plugin:digitalgarden:90:61)

Class Notes

Notes pages 15-20 under Singular Values: Geometric Method

Summary

Concentration of singular values for short/fat matrices
Matrix concentration inequality
Geometric method for random matrix analysis
Epsilon nets
- Discretizing matrix norms
- Existence of good nets

2. Rectangular Matrices

2.3 Spectral Analysis of Rectangular Gaussian Matrices

2.3.1 Singular Values: Geometric Method

Last time: Spectrum of random matrix

G \sim N (0, 1)^{\otimes d \times m}, d \leq m

and especially $d ≪ m$

$G^{T} G \approx d I_{m}$ (not actually tru)
from JL thm for a few quadratic forms
$G G^{T} \approx m I_{d}$ by LLN
- $d \times d$
- this is actually true in the operator norm

\begin{aligned} Δ & := G G^{T} - m I_{d} \\ = G G^{T} - E G G^{T} \end{aligned}

But recall the [[operator norm]]:

| | Δ | | = sup_{x \in S^{d - 1}} | x^{T} Δ x | = sup_{x \in B^{d}} | x^{T} Δ x |

B^{d} (x, r) := {y : | | x - y | | \leq r} \subset R^{d}

B^{d} := B^{d} (0, 1) \supset S^{d - 1}

Restricted Inner Product

Let $X = [x_{1}, \dots, x_{N}] \subseteq R^{d}$ and let $Δ \in M_{d}$ be symmetric. Then we define

| | Δ | |_{X} := sup_{x \in X} | x^{T} Δ x | = ⟨ Δ, x x^{T} ⟩

see [[restricted inner product]]

This is a [[semi-norm]] - whether or not it is a [[norm]] depends on the set $X$ .

+++ WHEN IS IT A NORM?
To be a norm, outer products need to span the space of symmetric matrices

Proposition

Let $M \in R^{d \times d}$ be a symmetric [[random matrix]] and let $X \subset R^{d}$ be finite. Then

P [| | M | |_{X} > t] \leq | X | \cdot max_{x \in X} P [| x^{T} M x | > t]

There is a tradeoff between the factor

| X |

and the error in approximating the

| | Δ | |_{X}

Thus the problem of accurate approximation becomes one of finding a small set $X$ that behaves like a "grid" approximating $S^{d - 1}$ (or $B^{d}$ )
We call this approach the geometric method because it reduces the problem of understanding norms of random matrices to a geometric problem instead

2.3.2 Nets, Coverings, and Packings

Discretizing the operator norm

We want to know if the [[restricted inner product]] approximates the [[operator norm]]

ϵ

-net

Let $X \subseteq Y \subseteq R^{d}$ . $X$ is an $ϵ$ -covering or an $ϵ$ -net of $Y$ if

\begin{aligned} ⋃_{x \in X} B^{d} (x, ϵ) & \supseteq Y \\ ⟺ \\ \forall y \in Y \exists x \in X & s.t. | | y - x | | \leq ϵ \end{aligned}

see [[epsilon net]]

Lemma

If $X$ is an [[epsilon net]] of $S^{d - 1}$ or $B^{d}$ , then

| | Δ | |_{X} \leq | | Δ | | \leq \frac{1}{1 - 2 ϵ} | | Δ | |_{X}

(if $ϵ \in (0, \frac{1}{2})$ for the second inequality)

Proof

Let $x \in S^{d - 1}$ be such that $| x^{T} Δ x | = | | Δ | |$ and let $x_{i} \in X$ such that $| | x - x_{i} | | \leq ϵ$ . Then

\begin{aligned} | | Δ | |_{X} & \geq | x_{i}^{T} Δ x_{i} | \\ = | (x + (x_{i} - x))^{T} Δ (x + (x_{i} - x)) | \\ = \underset{| | Δ | |}{\underset{⏟}{| x^{T} Δ x |}} - \underset{\leq | | x | | \cdot | | Δ | | \cdot | | x_{i} - x | | \leq ϵ | | Δ | |}{\underset{⏟}{| x^{T} Δ (x_{i} - x) |}} - \underset{\leq ϵ | | Δ | |}{\underset{⏟}{| (x_{i} - x)^{T} Δ x_{i} |}} \\ \geq (1 - 2 ϵ) | | Δ | | \end{aligned}

\begin{matrix} ◼ \end{matrix}

Note

++++ interpretation of special $x$

see [[epsilon net restricted inner product bounds the operator norm]]

We want to construct our epsilon nets so we can control their size. It is hard to manually describe nets in terms of their vectors, so we need some framework to define them otherwise.

Non-constructive existence of nets

ϵ

-packing

Let $X \subset Y$ . $X$ is an $ϵ$ -packing of $Y$ if

B^{d} (x, ϵ) \cap B^{d} (x^{'}, ϵ) = \emptyset \forall x \neq x^{'} \in X

ie if any two of these balls are sufficiently far apart. ie,

| | x - x^{'} | | > 2 ϵ \forall x \neq x^{'} \in X

see [[epsilon packing]]

Lemma

If $X$ is an net.

Maximal

We say that $X$ is maximal if $X \cup {x^{'}}$ is not a packing for all $x^{'} \notin X$

Proof

Let $y \in Y ∖ X$ . Maximality means $B^{d} (y, ϵ) \cap B^{d} (x, ϵ) \neq \emptyset$ for some $x \in X$ . Let $z$ be in this intersection

Then by the [[triangle inequality]] and by construction of our balls above, we get

| | y - x | | \leq | | y - z | | + | | z - x | | \leq 2 ϵ

\begin{matrix} ◼ \end{matrix}

see [[maximal epsilon packing is also a net]]

We want to show that small nets exist. We have a volumetric bound on the size of packings (including maximal ones), and as we saw above, this means there is a corresponding net. So we have a plan:

Show all epsilon packings are not "too large"
Apply to a maximal [[epsilon packing]]
This means that there exists a $2 ϵ$ net that is not too large

Proposition

Suppose $X$ is an [[epsilon packing]] of $B^{d}$ . Then $| X | \leq (1 + \frac{1}{ϵ})^{d}$

Note

The true value is is $\exp (O (\frac{d}{ϵ}))$

Proof

Since $X$ is a packing of $B^{d}$ , we know

\begin{aligned} ⋃_{x \in X} B^{d} (x, ε) & \subseteq B^{d} (0, 1 + ε) \\ ⟹ | X | \cdot vol (B^{d} (0, ε)) & \leq vol (B^{d} (0, 1 + ε)) \\ ⟹ | X | & \leq \frac{vol (B^{d} (0, 1 + ε))}{vol (B^{d} (0, ε))} \\ = {(\frac{1 + ε}{ε})}^{d} \end{aligned}

Note in the first line the LHS union is disjoint, which is how we get the second line.

\begin{matrix} ◼ \end{matrix}

see [[bound for the size of an epsilon packing]]

Corollary

For all $ε > 0$ , there exists an [[epsilon net]] $X$ of $B^{d}$ with $| X | \leq {(1 + \frac{2}{ϵ})}^{d}$

Proof

Define $X :=$ any maximal $\frac{ε}{2}$ packing. The result follows from [[maximal epsilon packing is also a net]] and the size bound follows immediately from the [[bound for the size of an epsilon packing]]

see [[we can find an epsilon net with the bound for an epsilon packing]]

2.3.3 Coarse Non-Asymptotic Bound on Singular Values

Putting everything together, we can get a bound on $| | Δ | |$ .

Using the existence not-too-big [[epsilon net]], we can approximate the norm by discretizing and test vectors in the net.
We can then use the first moment method (Taylor) to help get our bound

Theorem

For $d < m$ , let $G \sim N (0, 1)^{\otimes d \times m}$ . Define $Δ := G G^{T} - m I_{d}$ . Then

P [| | Δ | | \geq 64 \sqrt{d m}] \leq 2 \exp (- d)

Note: $\lvert \lvert G \rvert \rvert {#2} =\lvert \lvert GG^T \rvert \rvert \leq m+{\cal O}(\sqrt{ dm })$

Proof

Let $ε := \frac{1}{4}$ . Then we have:

There exists an [[epsilon net]] $X$ of $B^{d}$ where we our [[bound for the size of an epsilon packing]] yields

| X | \leq {(1 + \frac{2}{ε})}^{d} = {(1 + \frac{2}{\frac{1}{4}})}^{d} = 9^{d}

We also have the bound (from [[epsilon net restricted inner product bounds the operator norm]])

| | Δ | | \leq \frac{1}{1 - 2 ε} | | Δ | |_{X} = \frac{1}{1 - 2 (\frac{1}{4})} | | Δ | |_{X} = 2 | | Δ | |_{X}

\begin{aligned} P [| | Δ | | \geq t] & \leq P [| | Δ | |_{X} \geq \frac{t}{2}] \\ union bound ⟹ & \leq \sum_{x \in X} P [\underset{G G^{T} - m I_{d}}{\underset{⏟}{| x^{T} Δ x |}} \geq \frac{t}{2}] \end{aligned}

Since $x \in X \subseteq B^{d}$ , we know $| | x | | \leq 1$ . So let $\hat{x} := \frac{x}{| | x | |} \in S^{d - 1}$ . Then

\begin{aligned} ⟹ & \leq \sum_{x \in X} P [| {\hat{x}}^{T} Δ \hat{x} | \geq \frac{t}{2}] \\ = \sum_{x \in X} P_{G} [| | | \underset{\in R^{m}, Law = N (0, I_{m})}{\underset{⏟}{G^{T} \hat{x}}} | |^{2} - m | \geq \frac{t}{2}] \end{aligned}

Now, let $g = G^{T} \hat{x}$

\begin{align} \implies &= \lvert {\cal X} \rvert \cdot \mathbb{P}_{g \sim N(0, I_{m})}\left[ \lvert\,\lvert \lvert g \rvert \rvert {#2} -m \,\rvert \geq \frac{t}{2} \right] \\ &\leq 9^d\cdot 2\exp\left( -\frac{1}{8}\min\left\{ \frac{t^2}{4m}, \frac{t}{2} \right\} \right) \\ &\leq 2\exp\left( 3d - \frac{1}{8} \min\left\{ \frac{C^2}{4}d, \underbrace{\frac{C}{2}\sqrt{ dm }}_{d\leq m \implies \sqrt{ dm } \geq d} \right\} \right) \end{align}

And setting $t := C \sqrt{d m}$ and noting that $9 \leq \exp (3)$ , we get

\begin{aligned} ⟹ & \leq 2 \exp (d [3 - \underset{C = 64 \to min (> 4, 4)}{\underset{⏟}{min {\frac{C^{2}}{32}, \frac{C}{16}}}}]) \\ (*) C := 64 \to & \leq 2 \exp (- d) \end{aligned}

For $(*)$ , we want to pick $C$ big enough that everything is negative.

\begin{matrix} ◼ \end{matrix}

Union Bound

the union bound is countable subadditivity or the fact that

P (⋃_{i} A_{i}) \leq \sum_{i} P (A_{i})

see [[high probability bound for operator norm of difference for Gaussian covariance matrix]]

From [[Weyl's Theorem]], we can get the following result:

Corollary (special case)

Let $A, B \in M_{d}$ be symmetric. Then

λ_{d} (A) - | | B | | \leq λ_{d} (A + B) \leq λ_{1} (A + B) \leq λ_{1} (A) + | | B | |

Where $| | \cdot | |$ is the [[operator norm]].

Proof

The proof is similar to the one proving Weyl, except we can use the variational description of the eigenvalues instead of Courant-Fisher. For the first bound, we have

\begin{aligned} λ_{1} (A + B) & = sup_{x \in S^{d - 1}} x^{T} (A + B) x \\ \leq sup_{x \in S^{d - 1}} (x^{T} A x + | | B | |) \\ = sup_{x \in S^{d - 1}} x^{T} A x + | | B | | \\ = λ_{1} (A) + | | B | | \end{aligned}

(by just distributing the $sup$ and by definition of the [[operator norm]]). And, similarly, we have

\begin{aligned} λ_{d} (A + B) & = inf_{x \in S^{d - 1}} x^{T} (A + B) x \\ \geq inf_{x \in S^{d - 1}} (x^{T} A x - | | B | |) \\ = (inf_{x \in S^{d - 1}} x^{T} A x) - | | B | | \\ = λ_{d} (A) - | | B | | \end{aligned}

Corollary

Let $G \in R^{d \times m}$ be a Gaussian [[random matrix]] with $d \leq m$ . Then

P [\sqrt{m} - 64 \sqrt{d} \leq σ_{d} (G) \leq \dots \leq σ_{1} (G) \leq \sqrt{m} + 32 \sqrt{d}] > 1 - 2 \exp (- d)

Important

ie all singular values are { $\sqrt{m} \pm O (\sqrt{d})$ }

Note

This is a good event; we want this to happen. And luckily, we have high probability for this event.

Proof

Recall

First-order Taylor approximation (and concavity of $\sqrt{\cdot}$ ) yields

\begin{aligned} \sqrt{1 + x} & \leq 1 + \frac{x}{2} & for x \geq 0 \\ \sqrt{1 - x} & \geq 1 - x & for 0 \leq x \leq 1 \end{aligned}

If the event for our [[high probability bound for operator norm of difference for Gaussian covariance matrix]] happens, then we have

| | Δ | | = | | G G^{T} - m I_{d} | | \leq 64 \sqrt{d m}

ie if the event happens, we have
(1)

\begin{aligned} σ_{1} (G) = | | G | | & = \sqrt{| | G G^{T} | |} \\ (*) & \leq \sqrt{| | m I_{d} | | + | | Δ | |} \\ \leq \sqrt{m + 64 \sqrt{d m}} \\ = \sqrt{m} \sqrt{1 + 64 \sqrt{\frac{d}{m}}} \\ (* *) & \leq \sqrt{m} (1 + \frac{64}{2} \sqrt{\frac{d}{m}}) \\ = \sqrt{m} + 32 \sqrt{d} \end{aligned}

Where

$(*)$ is from our special case of Weyl
$(* *)$ is from the first of our two bounds above

Then, for $σ_{d} (G)$ , we have a similar bound.

Note that if $m - 64 \sqrt{d m} < 0$ , then $\sqrt{m} - 64 \sqrt{d} < 0$ and the result follows. So suppose $m - 64 \sqrt{d m} \geq 0$ .

(2)

\begin{aligned} σ_{d} (G) & = \sqrt{λ_{d} (G G^{T})} \\ (*) & \geq \sqrt{λ_{d} (m I_{d}) - | | Δ | |} \\ \geq \sqrt{m - 64 \sqrt{d m}} \\ = \sqrt{m} \sqrt{1 - 64 \sqrt{\frac{d}{m}}} \\ (* *) & \geq \sqrt{m} (1 - 64 \sqrt{\frac{d}{m}}) \\ = \sqrt{m} - 64 \sqrt{d} \end{aligned}

Where

$(*)$ is from our special case of Weyl
$(* *)$ is from the second of our two bounds above

\begin{matrix} ◼ \end{matrix}

see [[high probability bound on singular values of gaussian random matrix]]

$G^{T} G \in R^{m \times m}$

eigenvalues of $\frac{1}{m} G^{T} G$ for $d ≪ m$
we know
$λ_{1} \geq \dots \geq λ_{d} \geq λ_{d + 1} \geq \dots \geq λ_{m} \geq 0$

$λ_{1} \approx \dots \approx λ_{d} \approx 1$
$λ_{d + 1} \approx \dots \approx λ_{m} \approx 0$

approximately a random projection matrix $\frac{1}{m} G^{T} G$

Next time:

this method of proof
singular vectors
what is this fluctuation around the eigenvalues?

Review

#flashcards/math/rmt

TODO

Clean up lecture ⏳ 2025-09-05 ✅ 2025-09-05
Finish linking ⏳ 2025-09-05 ✅ 2025-09-05
Add flashcards #class_notes/clean ⏳ 2025-09-06

const { dateTime } = await cJS()

return function View() {
	const file = dc.useCurrentFile();
	return <p class="dv-modified">Created {dateTime.getCreated(file)}     ֍     Last Modified {dateTime.getLastMod(file)}</p>
}