Random Matrix Lecture 20

[[lecture-data]]

Info

let current = dv.current();
let previous = current.previous_lecture;
 
// get notes with this as the previous
let pages = dv.pages("")
	.where(p => p.previous_lecture)
	.where(p => String(p.previous_lecture)
		.includes(dv.current().file.link));
 
let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `;
 
if (pages.length > 0) {
	output += pages
		.map(p => p.file.link).join(", ");
} else {
	output += 'No Next Lecture'
}
 
output += ' :LiArrowBigRightDash:'
 
dv.el('div', output)
 

Class Notes

Summary

  • Variance inequalities for random matrices
    • Efron-Stein
    • bounded differences
    • Pointcare inquality

7. Non-Asymptotic RMT

Previously, we saw that

  • as ie
  • and

Next,

    • we want a concentration inequality

Example

We want something of the form

Note

This is bound by the geometric / epsilon net method The true value

\mathbb{P}\left[\lvert \lvert X \rvert \rvert \geq t\right] &= \mathbb{P}\left[\max_{\lvert \lvert v \rvert \rvert \leq 1} \left\lvert v^{\intercal}Xv \right\rvert \geq t\right] \\ &\leq \mathbb{P}\left[ \max_{v \in {\cal \underbrace{ X }_{\varepsilon \text{-net} }}} \underbrace{ \left\lvert v^{\intercal}Xv \right\rvert }_{ {{\cal N}(0,2)} } \geq t(1-2\varepsilon) \right] \\ &\leq \underbrace{ \lvert {\cal X} \rvert }_{ \leq \left( 1+\frac{2}{\varepsilon} \right)^d } \,\, \mathbb{P}[\lvert {\cal N}(0,2) \rvert \geq t(1-2\varepsilon)] \\ \left( \varepsilon=\frac{1}{4} \right) \implies&\leq 2 \cdot 9^d \exp\left( -\frac{t^2}{16} \right) \\ &\lesssim \exp(Cd - ct^2) \\ \implies \mathbb{P}\left[ \lvert \lvert X^{(d)} \rvert \rvert \geq(10+t)\sqrt{ d }\right] &\leq \exp(-c'dt^2) \end{align}$$

(page 111 of Last Year’s Notes)

Principles of Concentration Inequality

First, we want it to be possible to bound without knowing

Example

and or

Second, we expect to concentrate near if

  1. the are weakly dependent (e.g. independent)
  2. is weakly sensitive to inputs

Our Goals:

  • Prove bounds on under the above conditions for concentration
  • Consider “RMT choices” of (as in the example above)

Variance bounds

\text{Var}(f(X)) &= \mathbb{E}\left[f(x) - \mathbb{E}\left[f(x)\right] ^2\right] \\ &\leq C \end{align}$$ By [[Chebyshev's Inequality]] > [!example] > > Suppose $x_{1},\dots, x_{N}$ are independent with $f(x) = \sum_{i}x_{i}$. Then $\text{Var}(f) = \sum_{i=1}^N \text{Var}(x_{i})$ > > [!theorem] Efron-Stein Theorem > > Let $x_{1},\dots,x_{N}$ be independent [[random variable|random variables]] and $f$ any function of them. Then > $$\text{Var}(f) \leq \sum_{i=1}^{N} \mathbb{E}_{x_{1},\dots,x_{i-1},x_{i+1},\dots x_{N}}\left[ \,\text{Var}_{x_{i}}(f(x_{1},\dots,x_{i-1},x_{i}, x_{i+1}, \dots,x_{N}))\,\right] $$ > We want something to avoid taking this expectation > [!proof] > > Let's denote $\mathbb{E}_{i}:=\mathbb{E}_{x_{i}}$ and $\mathbb{E}_{\leq i} := \mathbb{E}_{x_{1},\dots,x_{i}}$ > > $$\begin{align} > \text{Var}(f) &= \mathbb{E}\left[f-\mathbb{E}\left[f\right]^2 \right] \\ > &= \mathbb{E}\left[ (f-\cancel{ \mathbb{E}_{\leq 1}\left[f\right] } ) + \underbrace{ (\cancel{ \mathbb{E}_{\leq 1}\left[f\right] } - \cancel{ \mathbb{E}_{\leq 2}\left[f\right] } ) }_{ \mathbb{E}_{y_{1}}\left[f(y_{1},x_{2},\dots,x_{N})\right] -\mathbb{E}_{y_{1},y_{2}}\left[y_{1},y_{2},x_{3},\dots,x_{N}\right] }+\cancel{ \dots }+(\cancel{ \mathbb{E}_{\leq N-1}\left[f\right] } -\mathbb{E}_{\leq N}\left[f\right] )^2\right] \\ > & > \end{align}$$ > > Expanding for the cross terms, if $i \leq j$, we get > $$\begin{align} > \mathbb{E}_{x_{1},\dots,x_{N}}&\left[\,(\mathbb{E}_{y_{1},\dots,y_{i}}\left[y_{1},\dots,y_{i},x_{i+1},\dots,x_{j},\dots,x_{N}\right]) \times(\mathbb{E}_{z_{1},\dots,z_{j}}\left[z_{1},\dots,z_{j},x_{j+1},\dots,x_{N}\right] ) \,\right] \\ > &=\mathbb{E}\left[\,\mathbb{E}_{\leq j}\left[f\right] ^2\,\right] > \end{align}$$ > Generally, $\mathbb{E}\left[\,\mathbb{E}_{\leq i}\left[f\right]\,\,\mathbb{E}_{\leq j}\left[f\right]\,\right]=\mathbb{E}_{}\left[(\mathbb{E}_{\leq\max\{ i,j \}}\left[f\right])^2\right]$ > > $$\begin{align} > &\mathbb{E}_{}\left[(\mathbb{E}_{\leq i}\left[f\right] -\mathbb{E}_{\leq i+1}\left[f\right] )\times(\mathbb{E}_{\leq j}\left[f\right] -\mathbb{E}_{\leq j+1}\left[f\right] )\right] \\ > &= \mathbb{E}_{}\left[(\mathbb{E}_{\leq j}\left[f\right]) ^2\right] - \mathbb{E}_{}\left[(\mathbb{E}_{\leq j}\left[f\right])^2 \right] +\mathbb{E}_{}\left[(\mathbb{E}_{\leq j+1}\left[f\right] )^2\right] -\mathbb{E}_{}\left[(\mathbb{E}_{\leq j+1}\left[f\right] )^2\right] \\ > &=0 > \end{align}$$ > By [[Jensen's inequality]]. So we get > $$\begin{align} > \text{Var}(f) &= \sum_{i=1}^N \mathbb{E}_{}\left[(\mathbb{E}_{\leq i-1}\left[f\right] -\mathbb{E}_{\leq i}\left[f\right] )^2\right] \\ > &= \sum_{i=1}^N \mathbb{E}_{}\left[\mathbb{E}_{y_{1},\dots,y_{i-1}}\left[f(y_{1},\dots,y_{i-1},x_{i},\dots,x_{N})\right] -\mathbb{E}_{y_{i}}\left[f(y_{1},\dots,y_{i-1}, y_{i}, x_{i+1},\dots,x_{N})\right] \right] \\ > (*)&\leq \sum_{i=1}^N \underbrace{ \mathbb{E}_{}\left[\mathbb{E}_{x_{i}}\left[f(x_{1},\dots,x_{N})\right] -\mathbb{E}_{y_{i}}\left[f(x_{1},\dots,x_{i-1}, y_{i}, x_{i+1},\dots x_{N})\right] \right] }_{ \text{Var}_{x_{i}}f(x) } \\ > &= \sum_{i=1}^N \text{Var}_{x_{i}}(f(x)) > \end{align}$$ > Where we apply the inequality at $(*)$, and note that the $y_{i}$ are iid copies of the $x_i$ (and therefore can be interchanged freely, as convenient) > $$\tag*{$\blacksquare$}$$ > see [[Efron-Stein theorem]] > [!note] Remark > > The only assumption that we make in the above is that > $$f:\underbrace{ \Sigma_{1} }_{ x_{1} \in }\times\dots \times\underbrace{ \Sigma_{N} }_{ x_{N}\in } \to \mathbb{R}$$ > And the $x_{i}$ are independent. Prop > [!Theorem] Proposition > > If $X \in [A,B]$ [[almost everywhere|almost surely]], then > $$\text{Var}(X)\leq \left( \frac{B-A}{2} \right)^2$$ > [!proof] > > $$\text{Var}(X) = \min_{c} (\mathbb{E}[X-c])^2$$ see [[variance bound for bounded random variable]] > [!theorem] Corollary (Bounded Difference for Variance) > > Let $x = (x_{1},\dots,x_{N})$ have independent entries and $f: \Sigma_{1}\times\dots \times\Sigma_{N} \to \mathbb{R}$ be a function on the product measure space. Then if > $$\Delta_{i}f := \sup_{\substack{x_{1} \in \Sigma_{1}\\x_{2}\in \Sigma_{2}\\\vdots\\x_{N} \in \Sigma_{N}\\y_{i}\in\Sigma_{i}}} \{ f(x_{1},\dots,x_{i-1},x_{i},x_{i+1},\dots,x_{N}) - f(x_{1},..,x_{i-1},y_{i},x_{i+1},\dots x_{N}) \} \geq 0$$ > >By [[Efron-Stein theorem|Efron-Stein]], we get >$$\text{Var}(f(x)) \leq \frac{1}{4} \sum_{i=1}^N (\Delta_{i}f)^2$$ (see page 115 of last year's notes) See [[bounded difference for variance]] ## Application 1 Suppose $\mu$ is a [[probability measure]] bounded on support $[-k,k] = \Sigma_{i}$. Suppose $X \sim \text{Wig}(d,\mu)$ and $N = {N\choose 2}$. $$\begin{align} f(X) &:= \lambda_{1}(X) = f(x_{1},\dots,x_{N}) \\ \Delta_{(i,j)}f &\leq 2k \\ \implies \text{Var}(\lambda_{1}(X)) & \lesssim k^2d^2 \\ \implies \lambda_{1}(X) &= 2\sqrt{ d }\pm {\cal O}(d) \end{align}$$ We want can sharpen this bound by defining $$\begin{align} f(X) &:= \tilde{f}(y_{1},\dots,y_{d-1}) \\ \Delta_{i}\tilde{f} &\leq 2k \\ \implies \text{Var}(f)=\text{Var}(\tilde{f}) &\lesssim k^2d \\ \implies \lambda_{1}(X) &= 2\sqrt{ d } + {\cal O}(\sqrt{ d }) \end{align}$$ --- Prop > [!theorem] Prop - Fisher's Definition of Variance > > If $X,Y$ are independent copies, then > $$\text{Var}(X) = \frac{1}{2}\mathbb{E}\left[X-Y\right]^2 $$ (Fisher's definition, the blue-collar working-class definition) (see [[variance]]) In [[Efron-Stein theorem|ES]], the $x_{1},\dots,x_{N}$ have $y_{1},\dots,y_{N}$ as independent copies. So we can think of $$x^{(i)} = (x_{1},\dots,x_{i-1},\underbrace{ y_{i} }_{ \text{"resampled"} },x_{i+1},\dots,x_{n})$$ our vectors that look like this as "resampled" versions of our original vector for each $i$. We also have that $$\begin{align} \text{Var}(f) &\leq \frac{1}{2} \sum_{i=1}^N \mathbb{E}\left[\underbrace{ \,f(x)-f(x^{(i)})\, }_{ (*) }\right]^2 \\ &= \sum_{i=1}^N \mathbb{E}\left[\underbrace{ \max\{ 0, f(x)-f(x^{(i)})\} }_{ \text{"one-sided" E.S.} }\right]^2 \\ (^{\dagger})&\lesssim 16k^2 \sum_{i,j} \mathbb{E}\left[\lvert v_{1,i} \rvert^2 \lvert v_{1,j} \rvert ^2 \right] \\ &= 16k \end{align}$$ $(^{\dagger})$ If $X,Y \sim \text{Wig}(d, \mu)$, then ++++ see notes > [!theorem] Corollary > > Let $X \in \mathbb{R}^{d\times d}$ be [[hermitian|symmetric]] with *independent* entries in $[-k,k]$. Then > $$\text{Var}(\lambda_{1}(X)\,) \leq 16k^2$$ see last year's notes page 116 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```