[[lecture-data]]

Info

let current = dv.current();
let previous = current.previous_lecture;
 
// get notes with this as the previous
let pages = dv.pages("")
	.where(p => p.previous_lecture)
	.where(p => String(p.previous_lecture)
		.includes(dv.current().file.link));
 
let output = `:LiArrowBigLeftDash: ${previous || "No Previous Lecture"} | `;
 
if (pages.length > 0) {
	output += pages
		.map(p => p.file.link).join(", ");
} else {
	output += 'No Next Lecture'
}
 
output += ' :LiArrowBigRightDash:'
 
dv.el('div', output)

Class Notes

Summary

Subgaussian concentration inequalities for random matrices

McDiarmid Inequality

log-Sobolev inequalities

Gaussian-Lipschitz concentation

Applications of Gaussian-Lipschitz

7. Non-Asymptotic RMT

Last time, we saw

Cor $X \in R^{d \times d}$ symmetric with independent entries in $[- k, k]$ , then $Var (λ_{1} (X)) \leq 16 k^{2}$ What if we don’t assume

iid entries
centered ( $E [X] = 0$ ) ?

Example

Any adjacency matrix of an (unweighted) random graph with independent edges (Erdos-Renyi, SBM)

Remark

Bounded Differences implies $Var (f (x)) \leq \frac{1}{4} \sum_{i = 1}^{n} (Δ_{i} f)^{2}$
Efron-Stein theorem implies $Var (f (x)) \leq E [\sum (f (x) - f (x^{(i)}))^{2}]$

Question What about, for example, $X \sim GOE (d)$ when $k = \infty$ ?

Pointcare Inequality

Let $μ$ be a probability measure on $R^{N}$ . We say that $μ$ satisfies the Pointcare Inequality, denoted $P I (C)$ , if for all differentiable $f$ , $Var_{x \sim μ} (f (x)) \leq C E_{x \sim μ} [∣∣ \nabla f ∣ ∣^{2}]$

Notes

$\sum_{i = 1}^{N} E [(\partial_{i} f (x))^{2}]$ is an expression of the function’s sensitivity to each coordinate

Sometimes, $\nabla f$ is the “other” notion of " $\nabla$ "

see pointcare inequality

Theorem

If $μ$ has pointcare inequality $C$ and $f$ is $L$ Lipschitz, then $Var (f) \leq C L^{2}$

Proof

Exercise

Gaussian Pointcare Inquality Theorem (GIP)

$N (0, I_{N})$ has pointcare inequality where $C = 1$ (independent of $N$ !)

See Gaussian Pointcare Inequality Theorem

Corollary

$X \sim GOE (d) ⟹ Var (λ_{1} (X)) \leq 2$

Proof

We can view $g \sim N (0, I_{\frac{d ( d + 1 )}{2}})$ as $(g_{i, j})_{i \leq j}$ and construct $X \sim GOE$ where
$\sqrt{ 2 }g_{1,1} & \dots & & \\ \vdots & \ddots & g_{i,j} & & \\ & g_{i,j} & \ddots \\ & & & \sqrt{ 2 }g_{\frac{d(d+1)}{2}} \end{bmatrix}$$ Then define $f(g)=\lambda_{1}(X(g))$. This is Lipschitz with $L=\sqrt{ 2 }$ and we have $$\begin{align} \text{Var}_{X \sim GOE(GPI)}(\lambda_{1}(X)) &= \text{Var}_{g \sim {\cal N}(0, I_{\underbrace{ N }_{ d(d+1)/2 }})}(f(g)) \\ \lvert f(g)-f(h) \rvert &= \lvert \lambda_{1}(X(g)) \rvert -\lvert \lambda_{1}(X(h)) \rvert \\ &\leq \lvert \lvert X(g) - X(h) \rvert \rvert _{\text{op}} \\ &\leq \lvert \lvert X(g) - X(h) \rvert \rvert _{F} \\ &\leq \sqrt{ 2 } \lvert \lvert g-h \rvert \rvert \end{align}$$ $$\tag*{$\blacksquare$}$$$

Corollary

$X_{ij} = X_{ji} \sim ii d N (μ_{ij}, σ_{ij}^{2}) ⟹ Var (λ_{1} (X)) \leq 2 max_{ij} σ_{ij}^{2}$

Lemma ("Tensorization" of PI)

Suppose $μ_{1}, \dots, μ_{N}$ are probability measures over $R$ each with PI $C$ . Then the product measure $μ = Law (X_{1}, \dots, X_{N}) μ_{1} \otimes \dots \otimes μ_{N}$ Also has PI $C$ .

Proof

Using the Efron-Stein theorem, we see
$\text{Var}_{X\sim \mu}(f(x)) &\leq \sum_{i=1}^N \mathbb{E}_{\text{all except }i}\left[ \overbrace{\text{Var}_{X_{i}\sim \mu_{i}}(f(x)) }^{PI \text{ of }\mu_{i}}\right] \\ &\leq \sum_{i=1}^N \mathbb{E}_{\text{all except }i}\left[C\mathbb{E}_{X_{i}\sim \mu_{i}}\left[(\partial_{i} f(x))^2\right] \right] \\ &\leq C \mathbb{E}_{}\left[\lvert \lvert \nabla f(x) \rvert \rvert ^2\right] \end{align}$$$

see tensorization of PI

Proof of GPI

Proof

By the tensorization lemma, it suffices to do $N = 1$ . We will use the central limit theorem and Efron-Stein.

Introduce $z_{1}, z_{2}, \dots \sim iid Unif ({\pm 1})$ . Then we have
$\text{Var}_{g\sim {\cal N}(0, 1)}(f(g)) &\overset{\text{(CLT)}}\approx \text{Var}\left( f\left( \frac{1}{\sqrt{ k }}\sum_{i=1}^k z_{k} \right) \right) \\ &\overset{(ES)}\leq \frac{1}{2} \sum_{i=1}^k \mathbb{E}_{}\left[\left( f\left( \underbrace{ \frac{1}{\sqrt{ k }}\sum_{j=1}^k z_{j} }_{ \frac{1}{\sqrt{ k }}\sum_{j\neq i}z_{j} +\frac{z_{i}}{\sqrt{ k }} }\right) -f\left( \underbrace{ \frac{1}{\sqrt{ k }} \sum_{j\neq i}z_{j} }_{ S_{i} } + \frac{z_{i}}{\sqrt{ k }} \right) \right)^2\right] \\ &= \frac{1}{4} \sum_{i=1}^k \mathbb{E}_{}\left[\left( f\left( S_{i}+\frac{1}{\sqrt{ k }} \right)-f\left( S_{i}-\frac{1}{\sqrt{ k }} \right) \right)^2\right] \\ &\approx \frac{1}{4} \sum_{i=1}^k \mathbb{E}_{}\left[\left( f'(S_{i}) \frac{2}{\sqrt{ k }} \right)^2\right] \\ &= \mathbb{E}_{}\left[(f'(S_{1}))^2\right] \\ &\approx \mathbb{E}_{g \sim {\cal N}(0,1)}\left[(f'(g))^2\right] \end{align}$$$

Example

$f (x) = max_{i = 1 \dots N} x_{i}$ where $x_{i} \sim iid N (0, 1)$ . Then $E [f (x)] = 2 lo g N$ and $f$ is $1$ Lipschitz. Thus $Var (f (x)) \leq 1$

Subgaussian Concentration

$Var (f (x)) \leq σ^{2} ⟹ P [∣ f - E [f] \geq t ∣] \leq (Chebyshev) \frac{σ ^{2}}{t ^{2}}$ Our bounded difference inequality gives us $Var (f (x)) \leq σ^{2} := \frac{1}{4} \sum_{i} (Δ_{i} f)^{2}$

Theorem McDiarmid / Azumai-Hoefding $f - E [f]$ is $σ^{2}$ -subgaussian And from homework 1, we have $P [∣ f - E [f] ∣ > t] \leq 2 exp (- \frac{t ^{2}}{2 σ ^{2}})$

Proof Like Efron-Stein (to get something that looks like $Var (\sum X_{i}) = \sum Var (X_{i})$ ) using a Doob martingale $(E_{\leq k} [f (x)])_{k = 1}^{N}$ to get something from Hoefding.

Fact (important) Theorem

Theorem (important!)

(Generalization of GPI for Subgaussians)

If $x \sim N (0, I_{N})$ and $f$ is $L$ Lipschitz, then $f (x) - E [f (x)]$ is $L^{2}$ subgaussian, ie
$\mathbb{P}[\,\lvert f(x)-\mathbb{E}_{}\left[f(x)\right] \rvert \geq t] &\leq 2\exp\left( -\frac{t^2}{2L^2} \right) \\ \iff f(x)&= \mathbb{E}_{}\left[f(x)\right] \pm {\cal O}(L)\quad \text{w.h.p.} \end{align}$$$

See Gaussian Lipschitz concentration inequality HD Stat Theorem 2.26

"Very Gaussian" proof

Exercise

(homework 4)

Corollaries If $x \sim N (0, I_{N})$ and $X \sim GOE (d)$ , then

$max {x_{i}}$ is 1-subgaussian
$λ_{1} (X)$ is 2-subgaussian
$X_{ij} \sim iid N (μ_{ij}, σ_{ij}) ⟹ λ_{1} (X)$ is $2 max {σ_{ij}^{2}}$ subgaussian
$Y = β v v^{⊺} + X ⟹ λ_{1} (Y)$ is 2-subgaussian
If $x \sim N (0, I_{N})$ , then with high probability, $∣∣ x ∣∣$ is 1-subgaussian (compare to the proof for Johnson-Lindenstrauss)

Log-Sobolev Inequality

For $X > 0$ ,

\text{Ent}[X] &= \mathbb{E}_{}\left[X\log X\right] -\mathbb{E}_{}\left[X\right] \log(\mathbb{E}_{}\left[X\right] ) \\ & \mathbb{E}_{}\left[\phi(X)\right] - \phi(\mathbb{E}_{}\left[X\right] ) \end{align}$$ We can think of this as a "generalized variance" Lemma 1 [[Efron-Stein theorem|Efron-Stein]] for Entropy $$\text{Ent}[f(x)] \leq \sum_{i=1}^N \mathbb{E}_{\text{all except }i}\left[\text{Ent}[f(x)]\right] $$ Lemma (Herbst argument) Suppose for all $\lambda$, $\frac{\text{Ent}\left[ {e^{\lambda X}} \right]}{\mathbb{E}_{}\left[e^{\lambda X}\right]} \leq \frac{\sigma^2}{2}\lambda^2$. Then $X$ is $\sigma^2$ [[subgaussian]]. $\frac{\text{Ent}\left[ {e^{\lambda X}} \right]}{\mathbb{E}_{}\left[e^{\lambda X}\right]}$ is like $\psi_{X}(\lambda)$ but with tensorization! Where $\psi_{x}(\lambda)=\log(\mathbb{E}_{}\left[e^{\lambda x}\right])$ is the moment generating function. Def Modified Log Sobolev Inequality Suppose $\mu$ satisfies **MLSI**($C$). If $$\text{Ent}(e^{f(x)}) \leq C\,\mathbb{E}_{x \sim \mu}\left[\lvert \lvert \nabla f(x) \rvert \rvert^2\right] e^{f(x)}$$ see [[MLSI]] Lemma MLSI tensorizes (like [[pointcare inequality|PI]]) Theorem If $\mu$ has [[MLSI]]$(C)$ and $f$ is $L$ [[Lipschitz continuous|Lipschitz]], and $x \sim \mu$, then $f(x)$ is $2CL^\xi$ [[subgaussian]] Theorem ${\cal N}(0, 1)$ has [[MLSI]]$\left( \frac{1}{2} \right)$ ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```

mnzk digital garden

Explorer

Random Matrix Lecture 21

7. Non-Asymptotic RMT

Subgaussian Concentration

Log-Sobolev Inequality

Graph View

Table of Contents