[[lecture-data]]Info
let current = dv.current(); let previous = current.previous_lecture; // get notes with this as the previous let pages = dv.pages("") .where(p => p.previous_lecture) .where(p => String(p.previous_lecture) .includes(dv.current().file.link)); let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `; if (pages.length > 0) { output += pages .map(p => p.file.link).join(", "); } else { output += 'No Next Lecture' } output += ' :LiArrowBigRightDash:' dv.el('div', output)
Class Notes
Summary
- Gaussian series models
- Concentration of norm
- Location of norm
- Non-commutative Khintchine inequality
Last time Gaussian Lipschitz Concentration Theorem
Theorem (important!)
(Generalization of GPI for Subgaussians)
If and is Lipschitz, then is subgaussian, ie
$\begin{align} \mathbb{P}[,\lvert f(x)-\mathbb{E}{}\left[f(x)\right] \rvert \geq t] &\leq 2\exp\left( -\frac{t^2}{2L^2} \right) \ \iff f(x)&= \mathbb{E}{}\left[f(x)\right] \pm {\cal O}(L)\quad \text{w.h.p.} \end{align}$$
8. Matrix Concentrations
Gaussian Series
A symmetric random matrix is a Gaussian series if for deterministic matrices and .
see Gaussian series
Example
Example “Patterned” Wigner Matrices
- Gaussian circulant
Def And
Prop is a Gaussian series if and only if is a gaussian random vector.
Proof
\text{symvec}\left( A_{0}+\sum_{i}g_{i}A_{i} \right) &= \underbrace{ \text{symvec}(A_{0}) }_{ \mu } + \sum_{i} \underbrace{ g_{i}\text{symvec}(A_{i}) }_{ \text{Law}={\cal N}\left( 0, \sum_{i=1} \text{symvec}(A_{i})\text{symvec}(A_{i})^{\intercal} \right) } \end{align}$$ $(\impliedby)$ Suppose $x=\text{symvec}(X)\sim {\cal N}(\mu,\Sigma)$. Using the Cholesky / [[Gram matrix|Gram]] / [[diagonalizable|spectral decomposition]], we can write (because $\Sigma \succeq 0$) $$\begin{align} \Sigma &= \sum_{i}a_{i}a_{i}^{\intercal} \\ \implies x &\overset{\text{d}}= \mu + \Sigma \\ \implies X=\text{symmat}(x)&\overset{\text{d}}= \underbrace{ \text{symmat}(\mu) }_{ A_{0} } + \sum g_{i} \underbrace{ \text{symmat}(a_{i}) }_{ A_{i} } \end{align}$$ $$\tag*{$\blacksquare$}$$ What is the concentration of $\lambda_{1}(X)$? We'll note that $$\begin{align} X(g) &= A_{0} + \sum_{i}g_{i}A_{i},\quad g \in \mathbb{R}^D \\ f(g) &:= \lambda_{1}(X(g)) \end{align}$$ This is the same for $\lambda_{1}\to \lvert \lvert \cdot \rvert \rvert_{\text{op}}$ ([[operator norm]]) By the [[Gaussian Lipschitz concentration inequality]], it is enough to show that $f$ is [[Lipschitz continuous|Lipschitz]] to apply a bound. Note then that $$\begin{align} \lvert f(g) - f(h) \rvert & = \lvert \lambda_{1}(X(g)) - \lambda_{1}(X(h)) \rvert \\ &\leq \lvert \lvert X(g) - X(h) \rvert \rvert _{\text{op}} \\ &= \left\lvert \left\lvert \sum_{i}(g_{i}-h_{i})A_{i} \right\rvert \right\rvert_{\text{op}} \\ &= \max_{\lvert \lvert v \rvert \rvert =1} \left\lvert \sum_{i}(g_{i} - h_{i})\left( v^{\intercal}A_{i}v \right) \right\rvert \\ (*)&\leq \lvert \lvert g-h \rvert \rvert _{2} \underbrace{ \max_{\lvert \lvert v \rvert \rvert =1}\left( \sum_{i}\left( v^{\intercal}A_{i}v \right)^2 \right)^{1/2} }_{ \substack{\sigma_{*}(X)= \max_{\lvert \lvert v \rvert \rvert =1} \sqrt{ \text{Var}\left( v^{\intercal}Xv \right) } \\ \text{"weak variance" parameter}} } \end{align}$$ Where $(*)$ is by [[cauchy-schwarz theorem|Cauchy-Schwarz]]. ie, $f$ is $\sigma_{*}$-[[Lipschitz continuous|Lipschitz]]. ie, $\lambda_{1}(X)$ and $\lvert \lvert \cdot \rvert \rvert_{\text{op}}$ are $\sigma_{*}(X)^2$ [[subgaussian]]. Also, by Pointcare Inequality, we have $$\text{Var}(\lambda_{1}(X)), \text{Var}(\lvert \lvert X \rvert \rvert _{\text{op}}) \leq \sigma_{*}(X)^2$$ Def Weak Variance Parameter see [[weak matrix variance parameter]] Note that $$\begin{align} \sum_{i} \left( v^{\intercal}Av \right)^2 &= \text{Var}\left( v^{\intercal}Xv \right) \end{align}$$ This is like the amount of variance in the direction that has the most variance. --- def Let $\text{vec}: \mathbb{R}^{d\times d} \to \mathbb{R}^{d^2}$ where $$\begin{align} \text{vec}(X)^2 &:= \lvert \lvert \text{Cov}(\text{vec}(X)) \rvert \rvert \\ &= \left\lvert \left\lvert \Sigma_{i} \text{vec}(A_{i}) \text{vec}(A_{i})^{\intercal} \right\rvert \right\rvert \end{align}$$ Exercise If $\Sigma = \text{Cov}(\text{symvec}(X))$, then $$\lvert \lvert \Sigma \rvert \rvert _{\text{op}} \leq \text{vec}(X)^2 \leq 2 \lvert \lvert \Sigma \rvert \rvert _{\text{op}}$$ (last year's notes prop 5.4.15 pg 126) Prop $\sigma_{*}(X) \leq \text{vec(X)}$ Example Let $X_{ij}=X_{ij} \overset{\text{iid}}\sim {\cal N}(\mu_{ij}, \sigma^2_{ij})$. Then $$\lvert \lvert \text{Cov}(\text{symvec}(X)) \rvert \rvert = \lvert \lvert \text{diag}(\sigma^2_{1,1}, \sigma^2_{1,2},\dots,\sigma^2_{d,d}) \rvert \rvert = \max_{ij}(\sigma^2_{ij})$$ and so $$\text{vec}(X) \leq 2 \max_{ij} \sigma^2_{ij}$$ (Something we talked about last time) Example Suppose $X=g 11^{\intercal}$. Then $$\text{vec}(X)^2 = \lvert \lvert 1 \rvert \rvert = d^2\implies \lvert \lvert X \rvert \rvert =\lvert g \rvert d$$ Proof of Prop $$\begin{align} \sigma_{*}(X)^2 &= \max_{\lvert \lvert v \rvert \rvert =1} \sum_{i}\underbrace{ \left( v^{\intercal}A_{i}v \right)^2 }_{ \substack{= \left\langle A_{i}, vv^{\intercal} \right\rangle \\= \left\langle \text{vec}(A_{i}), \text{vec}\left( vv^{\intercal} \right) \right\rangle} } \\ \left( v^{\intercal}A_{i}v \right)^2 &= \left\langle \text{vec} (A_{i}) \text{vec} (A_{i})^{\intercal}, \text{vec} \left( vv^{\intercal} \right)\text{vec} \left( vv^{\intercal} \right)^{\intercal} \right\rangle \\ \implies \sigma_{*}(X)^2 &:= \max_{\lvert \lvert v \rvert \rvert =1} \left\langle \sum_{i} \text{vec} (A_{i}) \text{vec} (A_{i})^{\intercal}, \underbrace{ \text{vec} \left( vv^{\intercal} \right) }_{ w, \lvert \lvert w \rvert \rvert =1 }\underbrace{ \text{vec} \left( vv^{\intercal} \right) }_{ w }^{\intercal} \right\rangle \\ &\leq \left\lvert \left\lvert \sum_{i} \text{vec}(A_{i})\text{vec}(A_{i})^{\intercal} \right\rvert \right\rvert \\ &= \text{vec}(X)^2 \end{align}$$ $$\tag*{$\blacksquare$}$$ Note that the third line is essentially taking a tensor norm (but don't worry about this too much) >[!question] What can we say about $\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right]$ ? > def Let $X$ be a [[random matrix]]. Then its **(strong) variance parameter** (contrasted with the [[weak matrix variance parameter]]) $$\sigma(X)^2 := \lvert \lvert \mathbb{E}_{}\left[(X - \mathbb{E}_{}\left[X\right])^2\right] \rvert \rvert $$ If $X$ is a [[Gaussian series]], then $$\sigma(X)^2 = \left\lvert \left\lvert \sum_{i}A_{i}^2 \right\rvert \right\rvert $$ see [[strong matrix variance parameter]] Theorem (non-commutative Khintchine - NCK) Let $X$ be a [[Gaussian series]] with $A_{0}=0$ (ie centered Gaussian [[random matrix]]). For absolute $c=\frac{1}{\sqrt{ 2 }}, C=\sqrt{ 2e } > 0$ we have $$c\sigma(X) \leq \mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] \leq C \sqrt{ \log d } \,\sigma(X)$$ Example Recall that $\sqrt{ \log d }$ is the variance of the max of $d$ iid standard normal random variables. we see that $X_{ii} \overset{\text{iid}}\sim {\cal N}(0, 1), X_{ij} = 0$ has $$\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] = \mathbb{E}_{}\left[ \max_{i} \lvert X_{ii} \rvert \right] \approx \sqrt{ \log d }$$ and $$\sigma(X)^2 = \lvert \lvert \mathbb{E}_{}\left[X^2\right] \rvert \rvert = \lvert \lvert I \rvert \rvert =1 $$ Example $X \sim GOE(d, 1) \xrightarrow{} \mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert\,\right] \approx 2\sqrt{ d }$ $$\sigma(X)^2 = \lvert \lvert (d+1)I \rvert \rvert = d+1$$ The two examples show that we do indeed need these factors. Sometimes the upper bound is tight, and sometimes the lower bound is tight. In general, we cannot do much better than this. (page 129 in last year's notes) Concentration: w.h.p, we have $$\lvert \lvert X \rvert \rvert =\mathbb{E}_{}\left[ \lvert \lvert X \rvert \rvert \right] \pm {\cal O}(\sigma_{*}(X))$$ Note $$\mathbb{E}_{g \sim {\cal N}(0, 1)}\left[\,\lvert g \rvert \,\right] = \sqrt{ \frac{2}{\pi} } \implies \mathbb{E}_{g \sim {\cal N}(0, \sigma^2)}\left[\,\lvert g \rvert \,\right] = \sqrt{ \frac{2}{\pi} } \left(\mathbb{E}_{}\left[g^2\right] \right)^{1/2}$$ Prop If $\mathbb{E}_{}\left[X\right]=0$, then $$\sigma_{*}(X) \leq C \mathbb{E}_{}\left[X\right] $$ Proof $$\begin{align} \sigma_{*}(X) &= \max_{\lvert \lvert v \rvert \rvert =1} \left(\sqrt { \mathbb{E}_{}\left[ \underbrace{ v^{\intercal}Xv }_{ \text{Law}= {\cal N}(0, \sigma^2) }\right] ^2} \right) \\ &= \sqrt{ \frac{2}{\pi} }\max_{\lvert \lvert v \rvert \rvert =1} \mathbb{E}_{}\left[\,\left\lvert v^{\intercal}Xv \right\rvert \, \right] \\ (*)&\leq \sqrt{ \frac{2}{\pi} } \mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] \end{align}$$ Where $(*)$ is by [[Jensen's inequality]]. Proof (Lower Bound) Note that WHP, $$\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert^2\right]=(\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] )^2 + \text{Var}(\,\lvert \lvert X \rvert \rvert ) \overset{\text{GPI}} \leq (\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] )^2 + \sigma_{*}(X)^2 \leq K\cdot (\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert \,\right] ^2)$$ for some constant $K$. And also, $$\mathbb{E}_{}\left[\,\lvert \lvert X \rvert \rvert^2\right]= \mathbb{E}_{}\left[\,\lvert \lvert X^2 \rvert \rvert \,\right] \overset{\text{Jensen}} \geq \lvert \lvert \mathbb{E}_{}\left[X^2\right] \rvert \rvert =\sigma^2$$ Where we have used the [[Gaussian Pointcare Inequality]] and [[Jensen's inequality]] to achieve our bounds. ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```