[[lecture-data]]

Info

let current = dv.current();
let previous = current.previous_lecture;
 
// get notes with this as the previous
let pages = dv.pages("")
	.where(p => p.previous_lecture)
	.where(p => String(p.previous_lecture)
		.includes(dv.current().file.link));
 
let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `;
 
if (pages.length > 0) {
	output += pages
		.map(p => p.file.link).join(", ");
} else {
	output += 'No Next Lecture'
}
 
output += ' :LiArrowBigRightDash:'
 
dv.el('div', output)

Class Notes

Summary

Gaussian series models

Concentration of norm

Location of norm

Non-commutative Khintchine inequality

Last time Gaussian Lipschitz Concentration Theorem

Theorem (important!)

(Generalization of GPI for Subgaussians)

If $x \sim N (0, I_{N})$ and $f$ is $L$ Lipschitz, then $f (x) - E [f (x)]$ is $L^{2}$ subgaussian, ie

$\begin{align} \mathbb{P}[,\lvert f(x)-\mathbb{E}{}\left[f(x)\right] \rvert \geq t] &\leq 2\exp\left( -\frac{t^2}{2L^2} \right) \ \iff f(x)&= \mathbb{E}{}\left[f(x)\right] \pm {\cal O}(L)\quad \text{w.h.p.} \end{align}$$

8. Matrix Concentrations

Gaussian Series

A symmetric random matrix $X \in R^{d \times d}$ is a Gaussian series if $X = A_{0} + \sum_{i = 1}^{D} g_{i} A_{i}$ for deterministic matrices ${A_{i}}_{i = 0}^{D}$ and $g_{i} \sim iid N (0, 1)$ .

see Gaussian series

Example $Wig (d, N (0, 1)) X = \sum_{i < j} g_{{i, j} (e_{i} e_{j}^{⊺} + e_{j} e_{i}^{⊺})}$

Example “Patterned” Wigner Matrices

Gaussian circulant

Def $symvec : R_{sym}^{d \times d} \to R^{d (d + 1) /2} symvec (A) = (A_{ij})_{1 \leq i \leq j \leq d}$ And $symmat = symvec^{- 1}$

Prop $X$ is a Gaussian series if and only if $symvec (X)$ is a gaussian random vector.

Proof $(⟹)$

\text{symvec}\left( A_{0}+\sum_{i}g_{i}A_{i} \right) &= \underbrace{ \text{symvec}(A_{0}) }_{ \mu } + \sum_{i} \underbrace{ g_{i}\text{symvec}(A_{i}) }_{ \text{Law}={\cal N}\left( 0, \sum_{i=1} \text{symvec}(A_{i})\text{symvec}(A_{i})^{\intercal} \right) } \end{align}$$ $(\impliedby)$ Suppose $x=\text{symvec}(X)\sim {\cal N}(\mu,\Sigma)$. Using the Cholesky / [[Gram matrix|Gram]] / [[diagonalizable|spectral decomposition]], we can write (because $\Sigma \succeq 0$) $$\begin{align} \Sigma &= \sum_{i}a_{i}a_{i}^{\intercal} \\ \implies x &\overset{\text{d}}= \mu + \Sigma \\ \implies X=\text{symmat}(x)&\overset{\text{d}}= \underbrace{ \text{symmat}(\mu) }_{ A_{0} } + \sum g_{i} \underbrace{ \text{symmat}(a_{i}) }_{ A_{i} } \end{align}$$ $$\tag*{$\blacksquare$}$$ What is the concentration of $\lambda_{1}(X)$? We'll note that $$\begin{align} X(g) &= A_{0} + \sum_{i}g_{i}A_{i},\quad g \in \mathbb{R}^D \\ f(g) &:= \lambda_{1}(X(g)) \end{align}$$ This is the same for $\lambda_{1}\to \lvert \lvert \cdot \rvert \rvert_{\text{op}}$ ([[operator norm]]) By the [[Gaussian Lipschitz concentration inequality]], it is enough to show that $f$ is [[Lipschitz continuous|Lipschitz]] to apply a bound. Note then that $$\begin{align} \lvert f(g) - f(h) \rvert & = \lvert \lambda_{1}(X(g)) - \lambda_{1}(X(h)) \rvert \\ &\leq \lvert \lvert X(g) - X(h) \rvert \rvert _{\text{op}} \\ &= \left\lvert \left\lvert \sum_{i}(g_{i}-h_{i})A_{i} \right\rvert \right\rvert_{\text{op}} \\ &= \max_{\lvert \lvert v \rvert \rvert =1} \left\lvert \sum_{i}(g_{i} - h_{i})\left( v^{\intercal}A_{i}v \right) \right\rvert \\ (*)&\leq \lvert \lvert g-h \rvert \rvert _{2} \underbrace{ \max_{\lvert \lvert v \rvert \rvert =1}\left( \sum_{i}\left( v^{\intercal}A_{i}v \right)^2 \right)^{1/2} }_{ \substack{\sigma_{*}(X)= \max_{\lvert \lvert v \rvert \rvert =1} \sqrt{ \text{Var}\left( v^{\intercal}Xv \right) } \\ \text{"weak variance" parameter}} } \end{align}$$ Where $(*)$ is by [[cauchy-schwarz theorem|Cauchy-Schwarz]]. ie, $f$ is $\sigma_{*}$-[[Lipschitz continuous|Lipschitz]]. ie, $\lambda_{1}(X)$ and $\lvert \lvert \cdot \rvert \rvert_{\text{op}}$ are $\sigma_{*}(X)^2$ [[subgaussian]]. Also, by Pointcare Inequality, we have $$\text{Var}(\lambda_{1}(X)), \text{Var}(\lvert \lvert X \rvert \rvert _{\text{op}}) \leq \sigma_{*}(X)^2$$ Def Weak Variance Parameter see [[weak matrix variance parameter]] Note that $$\begin{align} \sum_{i} \left( v^{\intercal}Av \right)^2 &= \text{Var}\left( v^{\intercal}Xv \right) \end{align}$$ This is like the amount of variance in the direction that has the most variance. --- def Let $\text{vec}: \mathbb{R}^{d\times d} \to \mathbb{R}^{d^2}$ where $$\begin{align} \text{vec}(X)^2 &:= \lvert \lvert \text{Cov}(\text{vec}(X)) \rvert \rvert \\ &= \left\lvert \left\lvert \Sigma_{i} \text{vec}(A_{i}) \text{vec}(A_{i})^{\intercal} \right\rvert \right\rvert \end{align}$$ Exercise If $\Sigma = \text{Cov}(\text{symvec}(X))$, then $$\lvert \lvert \Sigma \rvert \rvert _{\text{op}} \leq \text{vec}(X)^2 \leq 2 \lvert \lvert \Sigma \rvert \rvert _{\text{op}}$$ (last year's notes prop 5.4.15 pg 126) Prop $\sigma_{*}(X) \leq \text{vec(X)}$ Example Let $X_{ij}=X_{ij} \overset{\text{iid}}\sim {\cal N}(\mu_{ij}, \sigma^2_{ij})$. Then $$\lvert \lvert \text{Cov}(\text{symvec}(X)) \rvert \rvert = \lvert \lvert \text{diag}(\sigma^2_{1,1}, \sigma^2_{1,2},\dots,\sigma^2_{d,d}) \rvert \rvert = \max_{ij}(\sigma^2_{ij})$$ and so $$\text{vec}(X) \leq 2 \max_{ij} \sigma^2_{ij}$$ (Something we talked about last time) Example Suppose $X=g 11^{\intercal}$. Then $$\text{vec}(X)^2 = \lvert \lvert 1 \rvert \rvert = d^2\implies \lvert \lvert X \rvert \rvert =\lvert g \rvert d$$ Proof of Prop $$\begin{align} \sigma_{*}(X)^2 &= \max_{\lvert \lvert v \rvert \rvert =1} \sum_{i}\underbrace{ \left( v^{\intercal}A_{i}v \right)^2 }_{ \substack{= \left\langle A_{i}, vv^{\intercal} \right\rangle \\= \left\langle \text{vec}(A_{i}), \text{vec}\left( vv^{\intercal} \right) \right\rangle} } \\ \left( v^{\intercal}A_{i}v \right)^2 &= \left\langle \text{vec} (A_{i}) \text{vec} (A_{i})^{\intercal}, \text{vec} \left( vv^{\intercal} \right)\text{vec} \left( vv^{\intercal} \right)^{\intercal} \right\rangle \\ \implies \sigma_{*}(X)^2 &:= \max_{\lvert \lvert v \rvert \rvert =1} \left\langle \sum_{i} \text{vec} (A_{i}) \text{vec} (A_{i})^{\intercal}, \underbrace{ \text{vec} \left( vv^{\intercal} \right) }_{ w, \lvert \lvert w \rvert \rvert =1 }\underbrace{ \text{vec} \left( vv^{\intercal} \right) }_{ w }^{\intercal} \right\rangle \\ &\leq \left\lvert \left\lvert \sum_{i} \text{vec}(A_{i})\text{vec}(A_{i})^{\intercal} \right\rvert \right\rvert \\ &= \text{vec}(X)^2 \end{align}$$ $$\tag*{$\blacksquare$}$$ Note that the third line is essentially taking a tensor norm (but don't worry about this too much) >[!question] What can we say about $\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right]$ ? > def Let $X$ be a [[random matrix]]. Then its **(strong) variance parameter** (contrasted with the [[weak matrix variance parameter]]) $$\sigma(X)^2 := \lvert \lvert \mathbb{E}_{}\left[(X - \mathbb{E}_{}\left[X\right])^2\right] \rvert \rvert $$ If $X$ is a [[Gaussian series]], then $$\sigma(X)^2 = \left\lvert \left\lvert \sum_{i}A_{i}^2 \right\rvert \right\rvert $$ see [[strong matrix variance parameter]] Theorem (non-commutative Khintchine - NCK) Let $X$ be a [[Gaussian series]] with $A_{0}=0$ (ie centered Gaussian [[random matrix]]). For absolute $c=\frac{1}{\sqrt{ 2 }}, C=\sqrt{ 2e } > 0$ we have $$c\sigma(X) \leq \mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] \leq C \sqrt{ \log d } \,\sigma(X)$$ Example Recall that $\sqrt{ \log d }$ is the variance of the max of $d$ iid standard normal random variables. we see that $X_{ii} \overset{\text{iid}}\sim {\cal N}(0, 1), X_{ij} = 0$ has $$\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] = \mathbb{E}_{}\left[ \max_{i} \lvert X_{ii} \rvert \right] \approx \sqrt{ \log d }$$ and $$\sigma(X)^2 = \lvert \lvert \mathbb{E}_{}\left[X^2\right] \rvert \rvert = \lvert \lvert I \rvert \rvert =1 $$ Example $X \sim GOE(d, 1) \xrightarrow{} \mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert\,\right] \approx 2\sqrt{ d }$ $$\sigma(X)^2 = \lvert \lvert (d+1)I \rvert \rvert = d+1$$ The two examples show that we do indeed need these factors. Sometimes the upper bound is tight, and sometimes the lower bound is tight. In general, we cannot do much better than this. (page 129 in last year's notes) Concentration: w.h.p, we have $$\lvert \lvert X \rvert \rvert =\mathbb{E}_{}\left[ \lvert \lvert X \rvert \rvert \right] \pm {\cal O}(\sigma_{*}(X))$$ Note $$\mathbb{E}_{g \sim {\cal N}(0, 1)}\left[\,\lvert g \rvert \,\right] = \sqrt{ \frac{2}{\pi} } \implies \mathbb{E}_{g \sim {\cal N}(0, \sigma^2)}\left[\,\lvert g \rvert \,\right] = \sqrt{ \frac{2}{\pi} } \left(\mathbb{E}_{}\left[g^2\right] \right)^{1/2}$$ Prop If $\mathbb{E}_{}\left[X\right]=0$, then $$\sigma_{*}(X) \leq C \mathbb{E}_{}\left[X\right] $$ Proof $$\begin{align} \sigma_{*}(X) &= \max_{\lvert \lvert v \rvert \rvert =1} \left(\sqrt { \mathbb{E}_{}\left[ \underbrace{ v^{\intercal}Xv }_{ \text{Law}= {\cal N}(0, \sigma^2) }\right] ^2} \right) \\ &= \sqrt{ \frac{2}{\pi} }\max_{\lvert \lvert v \rvert \rvert =1} \mathbb{E}_{}\left[\,\left\lvert v^{\intercal}Xv \right\rvert \, \right] \\ (*)&\leq \sqrt{ \frac{2}{\pi} } \mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] \end{align}$$ Where $(*)$ is by [[Jensen's inequality]]. Proof (Lower Bound) Note that WHP, $$\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert^2\right]=(\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] )^2 + \text{Var}(\,\lvert \lvert X \rvert \rvert ) \overset{\text{GPI}} \leq (\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] )^2 + \sigma_{*}(X)^2 \leq K\cdot (\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] ^2)$$ for some constant $K$. And also, $$\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert^2\right]= \mathbb{E}_{}\left[\,\lvert \lvert X^2 \rvert \rvert \,\right] \overset{\text{Jensen}} \geq \lvert \lvert \mathbb{E}_{}\left[X^2\right] \rvert \rvert =\sigma^2$$ Where we have used the [[Gaussian Pointcare Inequality]] and [[Jensen's inequality]] to achieve our bounds. ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```

mnzk digital garden

Explorer

Random Matrix Lecture 22

8. Matrix Concentrations

Graph View