[[lecture-data]]

Info

let current = dv.current();
let previous = current.previous_lecture;
 
// get notes with this as the previous
let pages = dv.pages("")
	.where(p => p.previous_lecture)
	.where(p => String(p.previous_lecture)
		.includes(dv.current().file.link));
 
let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `;
 
if (pages.length > 0) {
	output += pages
		.map(p => p.file.link).join(", ");
} else {
	output += 'No Next Lecture'
}
 
output += ' :LiArrowBigRightDash:'
 
dv.el('div', output)

Class Notes

Summary

Variance inequalities for random matrices

Efron-Stein

bounded differences

Pointcare inquality

7. Non-Asymptotic RMT

Previously, we saw that

$es d (X^{(d)}) (w) P μ$ as $d ↑ \infty$ ie $\frac{1}{d} \sum f (λ_{i} (X^{(d)})) P \int f d μ$
$∣∣ X^{(d)} ∣∣ P c$ and $λ_{i} (X^{(d)}) P c$

Next,

$E [∣∣ X^{d} ∣∣] = E (d) = ?$
$P [∣ ∣∣ X^{(d)} ∣∣ - E (d)∣ > t] = T (d, t) = ?$
- we want a concentration inequality

Example

We want something of the form $P [∣∣ X^{(d)} ∣∣ > t]$

$Δ = G G^{⊺} - E [G G^{⊺}]$

Note

This is bound by the geometric / epsilon net method $X = X^{(d)} = GOE (d, 1)$ The true value $\frac{∣∣ X ^{(d)} ∣∣}{d} P 2$
$\mathbb{P}\left[\lvert \lvert X \rvert \rvert \geq t\right] &= \mathbb{P}\left[\max_{\lvert \lvert v \rvert \rvert \leq 1} \left\lvert v^{\intercal}Xv \right\rvert \geq t\right] \\ &\leq \mathbb{P}\left[ \max_{v \in {\cal \underbrace{ X }_{\varepsilon \text{-net} }}} \underbrace{ \left\lvert v^{\intercal}Xv \right\rvert }_{ {{\cal N}(0,2)} } \geq t(1-2\varepsilon) \right] \\ &\leq \underbrace{ \lvert {\cal X} \rvert }_{ \leq \left( 1+\frac{2}{\varepsilon} \right)^d } \,\, \mathbb{P}[\lvert {\cal N}(0,2) \rvert \geq t(1-2\varepsilon)] \\ \left( \varepsilon=\frac{1}{4} \right) \implies&\leq 2 \cdot 9^d \exp\left( -\frac{t^2}{16} \right) \\ &\lesssim \exp(Cd - ct^2) \\ \implies \mathbb{P}\left[ \lvert \lvert X^{(d)} \rvert \rvert \geq(10+t)\sqrt{ d }\right] &\leq \exp(-c'dt^2) \end{align}$$$

(page 111 of Last Year’s Notes)

Principles of Concentration Inequality

First, we want it to be possible to bound $P [∣ Y - E [Y]∣ \geq t]$ without knowing $E [Y]$

Example

$N = \frac{d ( d + 1 )}{2}$ and $f (X) = ∣∣ X ∣∣$ or $λ_{1} (X)$

$X \sim Wig (d, μ)$

Second, we expect $f (x_{1}, \dots, x_{N}) = f (x)$ to concentrate near $E [f (x)]$ if

the $x_{i}$ are weakly dependent (e.g. independent)
$f$ is weakly sensitive to inputs

Our Goals:

Prove bounds on $P [∣ f (x) - E [f (x)] \geq t ∣]$ under the above conditions for concentration
Consider “RMT choices” of $f$ (as in the example above)

Variance bounds

\text{Var}(f(X)) &= \mathbb{E}\left[f(x) - \mathbb{E}\left[f(x)\right] ^2\right] \\ &\leq C \end{align}$$ By [[Chebyshev's Inequality]] > [!example] > > Suppose $x_{1},\dots, x_{N}$ are independent with $f(x) = \sum_{i}x_{i}$. Then $\text{Var}(f) = \sum_{i=1}^N \text{Var}(x_{i})$ > > [!theorem] Efron-Stein Theorem > > Let $x_{1},\dots,x_{N}$ be independent [[random variable|random variables]] and $f$ any function of them. Then > $$\text{Var}(f) \leq \sum_{i=1}^{N} \mathbb{E}_{x_{1},\dots,x_{i-1},x_{i+1},\dots x_{N}}\left[ \,\text{Var}_{x_{i}}(f(x_{1},\dots,x_{i-1},x_{i}, x_{i+1}, \dots,x_{N}))\,\right] $$ > We want something to avoid taking this expectation > [!proof] > > Let's denote $\mathbb{E}_{i}:=\mathbb{E}_{x_{i}}$ and $\mathbb{E}_{\leq i} := \mathbb{E}_{x_{1},\dots,x_{i}}$ > > $$\begin{align} > \text{Var}(f) &= \mathbb{E}\left[f-\mathbb{E}\left[f\right]^2 \right] \\ > &= \mathbb{E}\left[ (f-\cancel{ \mathbb{E}_{\leq 1}\left[f\right] } ) + \underbrace{ (\cancel{ \mathbb{E}_{\leq 1}\left[f\right] } - \cancel{ \mathbb{E}_{\leq 2}\left[f\right] } ) }_{ \mathbb{E}_{y_{1}}\left[f(y_{1},x_{2},\dots,x_{N})\right] -\mathbb{E}_{y_{1},y_{2}}\left[y_{1},y_{2},x_{3},\dots,x_{N}\right] }+\cancel{ \dots }+(\cancel{ \mathbb{E}_{\leq N-1}\left[f\right] } -\mathbb{E}_{\leq N}\left[f\right] )^2\right] \\ > & > \end{align}$$ > > Expanding for the cross terms, if $i \leq j$, we get > $$\begin{align} > \mathbb{E}_{x_{1},\dots,x_{N}}&\left[\,(\mathbb{E}_{y_{1},\dots,y_{i}}\left[y_{1},\dots,y_{i},x_{i+1},\dots,x_{j},\dots,x_{N}\right]) \times(\mathbb{E}_{z_{1},\dots,z_{j}}\left[z_{1},\dots,z_{j},x_{j+1},\dots,x_{N}\right] ) \,\right] \\ > &=\mathbb{E}\left[\,\mathbb{E}_{\leq j}\left[f\right] ^2\,\right] > \end{align}$$ > Generally, $\mathbb{E}\left[\,\mathbb{E}_{\leq i}\left[f\right]\,\,\mathbb{E}_{\leq j}\left[f\right]\,\right]=\mathbb{E}_{}\left[(\mathbb{E}_{\leq\max\{ i,j \}}\left[f\right])^2\right]$ > > $$\begin{align} > &\mathbb{E}_{}\left[(\mathbb{E}_{\leq i}\left[f\right] -\mathbb{E}_{\leq i+1}\left[f\right] )\times(\mathbb{E}_{\leq j}\left[f\right] -\mathbb{E}_{\leq j+1}\left[f\right] )\right] \\ > &= \mathbb{E}_{}\left[(\mathbb{E}_{\leq j}\left[f\right]) ^2\right] - \mathbb{E}_{}\left[(\mathbb{E}_{\leq j}\left[f\right])^2 \right] +\mathbb{E}_{}\left[(\mathbb{E}_{\leq j+1}\left[f\right] )^2\right] -\mathbb{E}_{}\left[(\mathbb{E}_{\leq j+1}\left[f\right] )^2\right] \\ > &=0 > \end{align}$$ > By [[Jensen's inequality]]. So we get > $$\begin{align} > \text{Var}(f) &= \sum_{i=1}^N \mathbb{E}_{}\left[(\mathbb{E}_{\leq i-1}\left[f\right] -\mathbb{E}_{\leq i}\left[f\right] )^2\right] \\ > &= \sum_{i=1}^N \mathbb{E}_{}\left[\mathbb{E}_{y_{1},\dots,y_{i-1}}\left[f(y_{1},\dots,y_{i-1},x_{i},\dots,x_{N})\right] -\mathbb{E}_{y_{i}}\left[f(y_{1},\dots,y_{i-1}, y_{i}, x_{i+1},\dots,x_{N})\right] \right] \\ > (*)&\leq \sum_{i=1}^N \underbrace{ \mathbb{E}_{}\left[\mathbb{E}_{x_{i}}\left[f(x_{1},\dots,x_{N})\right] -\mathbb{E}_{y_{i}}\left[f(x_{1},\dots,x_{i-1}, y_{i}, x_{i+1},\dots x_{N})\right] \right] }_{ \text{Var}_{x_{i}}f(x) } \\ > &= \sum_{i=1}^N \text{Var}_{x_{i}}(f(x)) > \end{align}$$ > Where we apply the inequality at $(*)$, and note that the $y_{i}$ are iid copies of the $x_i$ (and therefore can be interchanged freely, as convenient) > $$\tag*{$\blacksquare$}$$ > see [[Efron-Stein theorem]] > [!note] Remark > > The only assumption that we make in the above is that > $$f:\underbrace{ \Sigma_{1} }_{ x_{1} \in }\times\dots \times\underbrace{ \Sigma_{N} }_{ x_{N}\in } \to \mathbb{R}$$ > And the $x_{i}$ are independent. Prop > [!Theorem] Proposition > > If $X \in [A,B]$ [[almost everywhere|almost surely]], then > $$\text{Var}(X)\leq \left( \frac{B-A}{2} \right)^2$$ > [!proof] > > $$\text{Var}(X) = \min_{c} (\mathbb{E}[X-c])^2$$ see [[variance bound for bounded random variable]] > [!theorem] Corollary (Bounded Difference for Variance) > > Let $x = (x_{1},\dots,x_{N})$ have independent entries and $f: \Sigma_{1}\times\dots \times\Sigma_{N} \to \mathbb{R}$ be a function on the product measure space. Then if > $$\Delta_{i}f := \sup_{\substack{x_{1} \in \Sigma_{1}\\x_{2}\in \Sigma_{2}\\\vdots\\x_{N} \in \Sigma_{N}\\y_{i}\in\Sigma_{i}}} \{ f(x_{1},\dots,x_{i-1},x_{i},x_{i+1},\dots,x_{N}) - f(x_{1},..,x_{i-1},y_{i},x_{i+1},\dots x_{N}) \} \geq 0$$ > >By [[Efron-Stein theorem|Efron-Stein]], we get >$$\text{Var}(f(x)) \leq \frac{1}{4} \sum_{i=1}^N (\Delta_{i}f)^2$$ (see page 115 of last year's notes) See [[bounded difference for variance]] ## Application 1 Suppose $\mu$ is a [[probability measure]] bounded on support $[-k,k] = \Sigma_{i}$. Suppose $X \sim \text{Wig}(d,\mu)$ and $N = {N\choose 2}$. $$\begin{align} f(X) &:= \lambda_{1}(X) = f(x_{1},\dots,x_{N}) \\ \Delta_{(i,j)}f &\leq 2k \\ \implies \text{Var}(\lambda_{1}(X)) & \lesssim k^2d^2 \\ \implies \lambda_{1}(X) &= 2\sqrt{ d }\pm {\cal O}(d) \end{align}$$ We want can sharpen this bound by defining $$\begin{align} f(X) &:= \tilde{f}(y_{1},\dots,y_{d-1}) \\ \Delta_{i}\tilde{f} &\leq 2k \\ \implies \text{Var}(f)=\text{Var}(\tilde{f}) &\lesssim k^2d \\ \implies \lambda_{1}(X) &= 2\sqrt{ d } + {\cal O}(\sqrt{ d }) \end{align}$$ --- Prop > [!theorem] Prop - Fisher's Definition of Variance > > If $X,Y$ are independent copies, then > $$\text{Var}(X) = \frac{1}{2}\mathbb{E}\left[X-Y\right]^2 $$ (Fisher's definition, the blue-collar working-class definition) (see [[variance]]) In [[Efron-Stein theorem|ES]], the $x_{1},\dots,x_{N}$ have $y_{1},\dots,y_{N}$ as independent copies. So we can think of $$x^{(i)} = (x_{1},\dots,x_{i-1},\underbrace{ y_{i} }_{ \text{"resampled"} },x_{i+1},\dots,x_{n})$$ our vectors that look like this as "resampled" versions of our original vector for each $i$. We also have that $$\begin{align} \text{Var}(f) &\leq \frac{1}{2} \sum_{i=1}^N \mathbb{E}\left[\underbrace{ \,f(x)-f(x^{(i)})\, }_{ (*) }\right]^2 \\ &= \sum_{i=1}^N \mathbb{E}\left[\underbrace{ \max\{ 0, f(x)-f(x^{(i)})\} }_{ \text{"one-sided" E.S.} }\right]^2 \\ (^{\dagger})&\lesssim 16k^2 \sum_{i,j} \mathbb{E}\left[\lvert v_{1,i} \rvert^2 \lvert v_{1,j} \rvert ^2 \right] \\ &= 16k \end{align}$$ $(^{\dagger})$ If $X,Y \sim \text{Wig}(d, \mu)$, then ++++ see notes > [!theorem] Corollary > > Let $X \in \mathbb{R}^{d\times d}$ be [[hermitian|symmetric]] with *independent* entries in $[-k,k]$. Then > $$\text{Var}(\lambda_{1}(X)\,) \leq 16k^2$$ see last year's notes page 116 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```

mnzk digital garden

Explorer

Random Matrix Lecture 20

7. Non-Asymptotic RMT

Principles of Concentration Inequality

Variance bounds

Graph View

Table of Contents