[[lecture-data]]:LiArrowBigLeftDash: = choice(this["previous_lecture"], this["previous_lecture"], "No Previous Lecture") | = choice(this["next_lecture"], this["next_lecture"], "No Next Lecture") :LiArrowBigRightDash:
Class Notes
pg 7 - 12 (1.4 - 2.1)
Summary
- Proof of concentration of Gaussian random vector norms (see end of Random Matrix Lecture 01 / concentration inequality for magnitude of standard gaussian random vector)
- how to think about high-dimensional geometry
- multiplying by a Gaussian matrix
- Gaussian process viewpoint
1. Random Vector Theory
1.4 Consequences for High-Dimensional Geometry
Last time, we saw the statement and proof for the concentration inequality for magnitude of standard gaussian random vector. This is one of the fundamental facts of high-dimensional geometry.
This tells us that, with high probability, we have
\lvert \lvert x \rvert \rvert ^2 &= d + {\cal O}(\sqrt{ d }) \\ \implies \lvert \lvert x \rvert \rvert &= \sqrt{ d + {\cal O}(\sqrt{ d }) } \\ &= \sqrt{ d } \cdot \sqrt{ 1+ {\cal O}\left( \frac{1}{\sqrt{ d }} \right)} \\ &= \sqrt{ d }\left( 1+{\cal O}\left( \frac{1}{\sqrt{ d }} \right) \right) \\ &= \sqrt{ d } + {\cal O}(1) \end{align}$$ ie, the standard [[gaussian random vector]] falls close to the spherical shell of width ${\cal O}(1)$ around the sphere of radius $\sqrt{ d }$. > [!important] > > The standard [[gaussian random vector]] tends towards {==as||the surface of the sphere of radius $\sqrt{ d }$||"typical set" shape==} and falls within an "error region" surrounding that surface that has a width of {==ha||${\cal O}(1)$==} > <!--SR:!2026-02-09,27,210!2026-03-12,49,190--> > [!NOTE] > > This means a high-dimensional Gaussian has a {==1||*non-convex*==} typical set - it is the shape of {==2||a hollow spherical shell||shape==}! > - (Sometimes these vectors are called *thin-shell* vectors in convex geometry and probability) > - Contrast this with how we normally think of a Guassian high-probability/"typical" set in 1- or 2-D. This is usually some sort of solid blob around the origin. > <!--SR:!2026-08-01,200,270!2026-05-04,143,250--> There are some other high-dimensional behaviors of [[random variable|random variables]] that can be explained in the same way. > [!example] > > Consider $x \sim \text{Uniform}([-1, +1])^{\otimes d}$. A simple calculation yields > $$\mathbb{E}[\lvert \lvert x \rvert \rvert ^2] = d \mathbb{E}[x_{1}^2] = \frac{d}{3}$$ > And a similar calculation to the [[concentration inequality for magnitude of standard gaussian random vector|concentration inequality in the gaussian case]] yields, with high probability, that > $$\lvert \lvert x \rvert \rvert =\sqrt{ \frac{d}{3} } + {\cal O}(1)$$ > Which means almost all the mass of this distribution is at the *corners* of this $d$-dimensional box instead of the inscribed solid unit ball $B^d:=\{ x \in \mathbb{R}^d:\lvert \lvert x \rvert \rvert\leq 1 \}$. The concentration inequality in this case also implies that the ball relative to the box has bound > $$\begin{align} > \frac{\text{Vol}(B^d)}{\text{Vol}([-1, +1]^d)} &= \mathbb{P}_{x \sim \text{Unif}([-1, +1]^d)}\,[\lvert \lvert x \rvert \rvert \leq 1] \\ > &\leq \mathbb{P}_{x} [\,\lvert \,\,\lvert \lvert x \rvert \rvert ^2-d \,\rvert \geq d-1] \\ > &= \exp(-{\cal O}(d)) > \end{align}$$ > ie, the volume of the unit sphere is exponentially smaller than the volume of the box. Our [[concentration inequality for magnitude of standard gaussian random vector]] also gives us an approximation > [!important] Take Away > > $${\cal N}(0, I_{d}) \approx \text{Unif}(\mathbb{S}^{d-1} (\sqrt{ d }))$$ > >[!caution] > >This is an **informal** approximation! > We get this by noting 1. When $x \sim {\cal N}(0, I_{d})$, we have $\text{Law}\left( \frac{x}{\lvert \lvert x \rvert \rvert} \right) = \text{Unif}(\mathbb{S}^{d-1})$ ( which we get from [[normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere]] ) 2. $\lvert \lvert x \rvert \rvert \approx \sqrt{ d }$, which follows from the [[concentration inequality for magnitude of standard gaussian random vector]] This idea is formalized in [[Borel's central limit theorem]] (see [[gaussian random vectors are approximately uniform on the hollow sphere]]) >[!theorem] > > Let $x \sim \text{Unif}(\mathbb{S}^{d-1})$. Then $x_{1}\cdot \sqrt{ d } \to {\cal N}(0, 1)$ in distribution as $d \to \infty$. > - Further, for any fixed $k \geq 1$, we have $\sqrt{ d }\cdot(x_{1},\dots,x_{k}) \to {\cal N}(0, I_{k})$ in distribution as $d \to \infty$ > (without proof, for now) See [[Borel's central limit theorem]] --- # 2. Rectangular Matrices We begin by looking at asymmetric, rectangular matrices with iid Gaussian entries. We are particularly interested in the $d\times m$ matrix $$G \sim {\cal N}(0, 1)^{\otimes d\times m}$$ ## 2.1 Gaussian Random Field Viewpoint - [?] What does $G$ *do* to some $y \in \mathbb{R}^m$? - [?] What about several $y_{1},\dots,y_{n}\in \mathbb{R}^m$? We can use our results about [[gaussian random vector|gaussian random vectors]] to characterize these images. >[!theorem] Proposition > >Let $G \sim {\cal N}(0, 1)^{\otimes d\times m}$ be a [[random matrix]] and $y \in \mathbb{R}^m$. Then >$$\text{Law}(Gy) = {\cal N}(0, \lvert \lvert y \rvert \rvert ^2 I_{d})$$ >[!proof] > > > $Gy$ is [[linear function|linear]] in $G$. The entries of $G$ form a [[gaussian random vector]], and so $Gy$ must be Gaussian as well. Thus, it suffices to calculate the mean and covariance to find its law. > > Since each entry of $G$, call them $G_{i\alpha}$, is iid ${\cal N}(0, 1)$, we get > $$\begin{align} > \mathbb{E}(Gy)_{i} &= \mathbb{E}\left[ \sum_{\alpha=1}^m G_{i\alpha} y_{\alpha} \right] \\ > &=0 > \end{align}$$ > And for the variance > $$\begin{align} > \text{Cov}(\,(Gy)_{i}, (Gy)_{j}\,) &= \mathbb{E}[\,(Gy)_{i}(Gy)_{j}\,] \\ > &= \mathbb{E}\left( \left( \sum_{\alpha=1}^m G_{i\alpha} y_{\alpha} \right) \left( \sum_{\beta=1}^m G_{j\beta}y_{\beta} \right) \right) \\ > &= \sum_{\alpha,\beta=1}^m y_{\alpha}y_{\beta} \cdot \mathbb{E} [G_{i\alpha} G_{j\beta}] \\ > &= \begin{cases} > 0 & i \neq j \\ > \lvert \lvert y \rvert \rvert ^2 & i=j > \end{cases} \\ > \implies \text{Cov}(Gy,Gy) &= \lvert \lvert y \rvert \rvert ^2 I_{d} > \end{align}$$ > Above, we found the expectation and covariance in terms of the entry-wise products and sums. We can also do the calculation in terms of matrices. > [!proof] Proof (matrices) > > **Expectation** > If there is only one random matrix involved, we can write > $$\mathbb{E}[Gy] = \mathbb{E}[G]y\quad(=0)^{(*)}$$ > $\_^{(*)}$ In terms of our same setting above. This is a simple result of [linearity of expectation](https://en.wikipedia.org/wiki/Expected_value). > > **Covariance** > If $g_{1},\dots,g_{m}\in \mathbb{R}^d$ are the columns of $G$ so that each $g_{\alpha} \sim {\cal N}(0, I_{d})$ iid, then we see that > $$\begin{align} > \text{Cov}(Gy) &= \mathbb{E}\left[ (Gy)(Gy)^{\intercal} \right] \\ > &= \mathbb{E}\left[ \left( \sum_{\alpha=1}^m y_{\alpha} g_{\alpha} \right)\left( \sum_{\beta=1}^m y_{\beta} g_{\beta}^{\intercal} \right) \right] \\ > &= \sum_{\alpha,\beta=1}^m y_{\alpha} y_{\beta} \cdot \mathbb{E}\left[ g_{\alpha}g_{\beta}^{\intercal} \right] \\ > (*) &= \sum_{\alpha=1}^m y_{\alpha}^2 \cdot I_{d} \\ > &= \lvert \lvert y \rvert \rvert ^2 I_{d} > \end{align}$$ > Where again $(*)$ is in terms of the setting above. In this line we use the law of the $g_{\alpha}$. > $$\tag*{$\blacksquare$}$$ see [[gaussian random matrix transforms vectors into gaussian random vectors]] Using the same techniques, we can easily find the joint law of $Gy_{1},\dots,Gy_{n}$ for $n < \infty$. > [!theorem] Proposition > > Let $y_{1},\dots,y_{n}\in \mathbb{R}^m$ and $G \sim {\cal N}(0,1)^{\otimes d\times m}$ . Then > $$\text{Law}\left(\begin{bmatrix} > Gy_{1} \\ > \vdots \\ > Gy_{n} > \end{bmatrix}\right) = {\cal N}\left(0,\, > \begin{bmatrix} > \lvert \lvert y_{1} \rvert \rvert ^2 I_{d} & \langle y_{1},y_{2} \rangle I_{d} & \dots & \langle y_{1},y_{n} \rangle I_{d} \\ > \langle y_{2} , y_{1} \rangle I_{d} & \lvert \lvert y_{2} \rvert \rvert ^2 I_{d} & \dots & \langle y_{2}, y_{n} \rangle I_{d} \\ > \vdots & \vdots & \ddots & \vdots \\ > \langle y_{n}, y_{1} \rangle I_{d} & \langle y_{n} , y_{2} \rangle I_{d} & \dots & \lvert \lvert y_{n} \rvert \rvert ^2 I_{d} > \end{bmatrix}\right)$$ > Or, if we define $Y$ such that the $i$th column is $y_{i}$, then the the covariance matrix is given by > > $$\text{Cov}\left(\begin{bmatrix} > Gy_{1} \\ > \vdots \\ > Gy_{n} > \end{bmatrix}\right) = Y^{\intercal}Y \otimes I_{d} $$ > > [!proof] > > This follows immediately from [[gaussian random matrix transforms vectors into gaussian random vectors]] by applying the calculation to each block of the covariance matrix. (the expectation is identical) see [[joint of gaussian random transform of finite vectors]] This [[joint of gaussian random transform of finite vectors]] is the characterization of a **Gaussian random field**, which is a term for a high(er)-dimensional *Gaussian Process*. > [!quote]+ Characterization > > > [!theorem] Proposition > Let $y_{1},\dots,y_{n}\in \mathbb{R}^m$ and $G \sim {\cal N}(0,1)^{\otimes d\times m}$ . Then > $\text{Law}\left(\begin{bmatrix} > Gy_{1} \\ > \vdots \\ > Gy_{n} > \end{bmatrix}\right) = {\cal N}\left(0,\, > \begin{bmatrix} > \lvert \lvert y_{1} \rvert \rvert ^2 I_{d} & \langle y_{1},y_{2} \rangle I_{d} & \dots & \langle y_{1},y_{n} \rangle I_{d} \\ > \langle y_{2} , y_{1} \rangle I_{d} & \lvert \lvert y_{2} \rvert \rvert ^2 I_{d} & \dots & \langle y_{2}, y_{n} \rangle I_{d} \\ > \vdots & \vdots & \ddots & \vdots \\ > \langle y_{n}, y_{1} \rangle I_{d} & \langle y_{n} , y_{2} \rangle I_{d} & \dots & \lvert \lvert y_{n} \rvert \rvert ^2 I_{d} > \end{bmatrix}\right)$$ > Or, if we define $Y$ such that the $i$th column is $y_{i}$, then the the covariance matrix is given by > > $\text{Cov}\left(\begin{bmatrix} > Gy_{1} \\ > \vdots \\ > Gy_{n} > \end{bmatrix}\right) = Y^{\intercal}Y \otimes I_{d} $$ > In particular, this characterization associates the [[random vector]] $Gy \in \mathbb{R}^d$ to each fixed vector $y \in \mathbb{R}^m$. (see [[Gaussian Process|gaussian random field]] ) ### Geometric Interpretations of Multiplication From the [[joint of gaussian random transform of finite vectors|characterization]], we can better understand what $G$ is doing when it transforms vectors. > [!equation] Expected Magnitude > > $$\begin{align} > \mathbb{E}[\lvert \lvert Gy \rvert \rvert ^2] &= \mathrm{Tr}(\text{Cov}(Gy) ) \\ > &= d \lvert \lvert y \rvert \rvert ^2 > \end{align}$$ > > [!equation] Expected Direction > > $$\begin{align} > \mathbb{E}[\langle Gy_{1}, Gy_{2} \rangle] &= \mathrm{Tr}(\text{Cov}(Gy_{1}, Gy_{2}) ) \\ > &= d\langle y_{1},y_{2} \rangle > \end{align}$$ > > [!note]- > > The expected direction follows directly from the polarization identity > $$\langle y_{1},y_{2} \rangle = \frac{1}{4}\lvert \lvert y_{1}+y_{2} \rvert \rvert ^2 - \frac{1}{4} \lvert \lvert y_{1}-y_{2} \rvert \rvert ^2$$ > >[!exercise] > >Verify > In fact, $G$ acts in this way on the entire [[Gram matrix]] for any finite collection of vectors. >[!def] Recall > >Let $v_{1},\dots,v_{n} \in \mathbb{F}^k$ be a (finite) collection of vectors in $\mathbb{F}^k$, which has [[inner product]] $\langle \cdot,\cdot' \rangle$. The **Gram Matrix** $G$ of the $v_{i}$ is given by >$$G_{ij} = \langle v_{i}, v_{j} \rangle ,\quad G = V^*V$$ >Where $V$ is the matrix with $v_{i}$ as its $i$th column We can see then that if $G \sim {\cal N}(0, 1)^{\otimes d\times m}$ and $Y \in \mathbb{R}^{m\times n}$, then > [!equation] Expected Gram Matrix > > $$\mathbb{E}\left[\, (GY)^{\intercal}(GY) \,\right] =dY^{\intercal}Y$$ Notes on the Gram Matrix - The lengths of the $v_{i}$ are the entries along the diagonal - The angles between each pair are on the off-diagonal This information describes the entire geometry of the set of vectors; If any other set of vectors $u_{1},\dots,u_{n}$ has the same Gram matrix, then there must exist some $Q \in {\cal O}(k)$ such that $Qv_{i}=u_{i}$ for all $i$. > [!important] > > *In this case, it is possible to get from one point cloud to the other via some orthogonal transformation* - this is an if and only if! > From the [[joint of gaussian random transform of finite vectors]], we can also consider $$\hat{G}:= \frac{1}{\sqrt{ d }} G,\quad \text{Law}(\hat{G}) = {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}$$ Then we can see that $$\begin{align} \mathbb{E}[\,\lvert \lvert \hat{G}y \rvert \rvert ^2\,] &= \lvert \lvert y \rvert \rvert ^2 \\ \mathbb{E}[\,\langle \hat{G}y_{1}, \hat{G}y_{2} \rangle \,] &= \langle y_{1}, y_{2} \rangle \\ \mathbb{E}\left[ \,(\hat{G}Y)^{\intercal}(\hat{G}Y)\, \right] &= Y^{\intercal}Y \end{align}$$ ie, in expectation, $\hat{G}$ preserves the lengths, angles, and [[Gram matrix|Gram matrices]]. That is, it preserves all the geometry of $\mathbb{R}^m$ see [[normalized gaussian random matrix preserves geometry in expectation]] > [!NOTE] > > Note that this holds for *any* $d$, including $d \ll m$ and $d=1$. This is because we are taking expectations. If $y$ is selected first *before* we realize $\hat{G}$, then we hope that $\lvert \lvert \hat{G}y \rvert \rvert$ will concentrate about $\lvert \lvert y \rvert \rvert$. Our calculations for the [[joint of gaussian random transform of finite vectors|joint]] show that $$\text{Law}(\hat{G}y) = {\cal N}\left( 0, \frac{1}{d}\lvert \lvert y \rvert \rvert ^2 I_{d} \right)$$ So (with rescaling, obviously), our [[concentration inequality for magnitude of standard gaussian random vector]] holds and we should see the desired behavior. > [!caution] > > Assuming we ensure $\hat{G} \perp y$, the probability of actually [[normalized gaussian random matrix preserves geometry in expectation|preserving geometry]] depends on $d$. > [!example] > > If $d=1$, then $\hat{G} = g^{\intercal}$ for some $g \in {\cal N}(0, I_{d})$. We still have > $$\mathbb{E}[\lvert \lvert \hat{G}y_{i} \rvert \rvert ^2] = \mathbb{E}[\langle g,y_{i} \rangle ^2] =\lvert \lvert y_{i} \rvert \rvert ^2$$ > But clearly $\langle g,y_{i} \rangle^2 \approx \lvert \lvert y_{i} \rvert \rvert^2$ cannot hold simultaneously for many $y_{i}$ regardless of if we choose them before realizing $\hat{G}$. > > > > [!exercise] > > Construct some examples of this case > > > [!caution] > > If $d \ll m$, then once $\hat{G}$ is realized, we can pick $y \in \text{Null}(\hat{G})$ so that $0=\lvert \lvert \hat{G}y \rvert \rvert\ll \lvert \lvert y \rvert \rvert$. > - In this case, $\hat{G}$ will not preserve the geometry of arbitrary vectors $y$. > - In particular, we can *find* vectors $y$ that *depend* on $\hat{G}$ such that the geometry is badly distorted. Next time: As long as 1. $d$ is not too small 2. The point cloud $y_{1},\dots,y_{n}$ is fixed before drawing $\hat{G}$ $\hat{G}$ approximately preserves the geometry of the $y_{i}$ acting as approximate [[isometric|isometry]] even for $d \ll m$ # Review #flashcards/math ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```