Proof of concentration of Gaussian random vector norms (see end of [[Random Matrix Lecture 01]] / [[concentration inequality for magnitude of standard gaussian random vector]])
how to think about high-dimensional geometry
multiplying by a Gaussian matrix
Gaussian process viewpoint
1. Random Vector Theory
1.4 Consequences for High-Dimensional Geometry
Last time, we saw the statement and proof for the [[concentration inequality for magnitude of standard gaussian random vector]]. This is one of the fundamental facts of high-dimensional geometry.
This tells us that, with high probability, we have
ie, the standard [[gaussian random vector]] falls close to the spherical shell of width around the sphere of radius .
Important
The standard [[gaussian random vector]] tends towards {as||the surface of the sphere of radius ||"typical set" shape} and falls within an "error region" surrounding {ha||that surface that has a width of ||error region description}
Note
This means a high-dimensional Gaussian has a {1||non-convex} typical set - it is the shape of {2||a hollow spherical shell||shape}!
(Sometimes these vectors are called thin-shell vectors in convex geometry and probability)
Contrast this with how we normally think of a Guassian high-probability/"typical" set in 1- or 2-D. This is usually some sort of solid blob around the origin.
There are some other high-dimensional behaviors of random variables that can be explained in the same way.
Which means almost all the mass of this distribution is at the corners of this -dimensional box instead of the inscribed solid unit ball . The concentration inequality in this case also implies that the ball relative to the box has bound
ie, the volume of the unit sphere is exponentially smaller than the volume of the box.
Our [[concentration inequality for magnitude of standard gaussian random vector]] also gives us an approximation
Take Away
Caution
This is an informal approximation!
We get this by noting
When , we have ( which we get from [[normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere]] )
, which follows from the [[concentration inequality for magnitude of standard gaussian random vector]]
This idea is formalized in [[Borel's central limit theorem]]
(see [[gaussian random vectors are approximately uniform on the hollow sphere]])
Theorem
Let . Then in distribution as .
Further, for any fixed , we have in distribution as
(without proof, for now)
See [[Borel's central limit theorem]]
2. Rectangular Matrices
We begin by looking at asymmetric, rectangular matrices with iid Gaussian entries.
is linear in . The entries of form a [[gaussian random vector]], and so must be Gaussian as well. Thus, it suffices to calculate the mean and covariance to find its law.
Since each entry of , call them , is iid , we get
And for the variance
Above, we found the expectation and covariance in terms of the entry-wise products and sums. We can also do the calculation in terms of matrices.
Proof (matrices)
Expectation
If there is only one random matrix involved, we can write
Or, if we define such that the th column is , then the the covariance matrix is given by
Proof
This follows immediately from [[gaussian random matrix transforms vectors into gaussian random vectors]] by applying the calculation to each block of the covariance matrix. (the expectation is identical)
see [[joint of gaussian random transform of finite vectors]]
This [[joint of gaussian random transform of finite vectors]] is the characterization of a Gaussian random field, which is a term for a high(er)-dimensional Gaussian Process.
Characterization
Proposition
Let and . Then
Or, if we define such that the th column is , then the the covariance matrix is given by
From the characterization, we can better understand what is doing when it transforms vectors.
Expected Magnitude
You can't use 'macro parameter character #' in math mode\begin{align} \mathbb{E}[\lvert \lvert Gy \rvert \rvert { #2} ] &= \mathrm{Tr}(\text{Cov}(Gy) ) \\ &= d \lvert \lvert y \rvert \rvert { #2}
\end{align}$$
Expected Direction
Note
The expected direction follows directly from the polarization identity
You can't use 'macro parameter character #' in math mode\langle y_{1},y_{2} \rangle = \frac{1}{4}\lvert \lvert y_{1}+y_{2} \rvert \rvert { #2}
Where is the matrix with as its th column
We can see then that if and , then
Expected Gram Matrix
Notes on the Gram Matrix
The lengths of the are the entries along the diagonal
The angles between each pair are on the off-diagonal
This information describes the entire geometry of the set of vectors; If any other set of vectors has the same Gram matrix, then there must exist some such that for all .
Important
In this case, it is possible to get from one point cloud to the other via some orthogonal transformation - this is an if and only if!
From the [[joint of gaussian random transform of finite vectors]], we can also consider
Then we can see that
You can't use 'macro parameter character #' in math mode\begin{align} \mathbb{E}[\,\lvert \lvert \hat{G}y \rvert \rvert { #2} \,] &= \lvert \lvert y \rvert \rvert { #2} \\ \mathbb{E}[\,\langle \hat{G}y_{1}, \hat{G}y_{2} \rangle \,] &= \langle y_{1}, y_{2} \rangle \\ \mathbb{E}\left[ \,(\hat{G}Y)^{\intercal}(\hat{G}Y)\, \right] &= Y^{\intercal}Y \end{align}
ie, in expectation, preserves the lengths, angles, and Gram matrices. That is, it preserves all the geometry of
see [[normalized gaussian random matrix preserves geometry in expectation]]
Note
Note that this holds for any, including and . This is because we are taking expectations.
If is selected first before we realize , then we hope that will concentrate about . Our calculations for the joint show that
You can't use 'macro parameter character #' in math mode\text{Law}(\hat{G}y) = {\cal N}\left( 0, \frac{1}{d}\lvert \lvert y \rvert \rvert { #2} I_{d} \right)
So (with rescaling, obviously), our [[concentration inequality for magnitude of standard gaussian random vector]] holds and we should see the desired behavior.
Caution
Assuming we ensure , the probability of actually preserving geometry depends on .
Example
If , then for some . We still have
You can't use 'macro parameter character #' in math mode\mathbb{E}[\lvert \lvert \hat{G}y_{i} \rvert \rvert { #2} ] = \mathbb{E}[\langle g,y_{i} \rangle { #2} ] =\lvert \lvert y_{i} \rvert \rvert { #2}
But clearly cannot hold simultaneously for many regardless of if we choose them before realizing .
Exercise
Construct some examples of this case
Caution
If , then once is realized, we can pick so that .
In this case, will not preserve the geometry of arbitrary vectors .
In particular, we can find vectors that depend on such that the geometry is badly distorted.
Next time:
As long as
is not too small
The point cloud is fixed before drawing approximately preserves the geometry of the