[[lecture-data]]:LiArrowBigLeftDash: = choice(this["previous_lecture"], this["previous_lecture"], "No Previous Lecture") | = choice(this["next_lecture"], this["next_lecture"], "No Next Lecture") :LiArrowBigRightDash:
Quote
Class notes pgs 2-7 (1.1 - 1.3)
Summary
- Random vector theory
- Properties of Gaussian random vectors
- Concentration inequalities
1. Random Vector Theory
1.1 Natural Random Models and Orthogonal Invariance
Interpretations of vectors
- List of numbers
- Magnitude and direction (with respect to a basis of the vector space they belong to)
Corresponding interpretations for random vectors:
-
Entries (each of the numbers) are as independent as possible
- ie entries are iid from
-
Magnitude and direction are as independent as possible
- Take and to be independent
- The magnitude is then any random non-negative scalar and the direction is
- ie a random vector drawn uniformly from the unit sphere See random vector
Orthogonal Invariance
Let be a random vector. We say that (or its law ) is orthogonally invariant if for each (deterministic) orthogonal matrix we have
see also invariant
NOTE
The law is the distribution of the random vector. We sometimes use “model” instead of “law” or “distribution” to remind us that our choice includes assumptions and judgements.
- We are modelling a situation we might encounter in applications
Proposition
There exists a unique probability measure supported on that is orthogonally invariant. We denote this
see orthogonally invariant distribution on the unit sphere
Proposition
Suppose that is orthogonally invariant. Then
- is independent of
- .
- [p] Entry-wise interpretation of random vectors is described in these distribution definitions
- [p] magnitude/direction interpretation is also described
- [c] It is hard to swap between the two interpretations
Question
- What do the entries look like for a vector ?
- What are the magnitude and direction of an iid random vector ?
1.2 Gaussian Random Vectors
The most natural of the iid random vectors is the multivariate Gaussian
Multivariate Normal
The multivariate normal or Gaussian for parameters and (positive semidefinite) is the probability measure with density
With respect to the Lebesgue measure on the row space of and where
- is the Moore-Penrose inverse
- is the product of all non-zero eigenvalues
NOTE
When is invertible,
- is the ordinary determinant
- the Lebesgue measure is on all of
Gaussian Random Vector
If a random vector , we call a Gaussian random vector. If , then we call a standard Gaussian random vector.
Proposition
Let be a gaussian random vector. Then is the mean vector and is the covariance matrix of and
ie, the law of a gaussian random vector is determined by its mean and covariance (or its linear and quadratic moments)
Theorem
Let be a gaussian random vector, , and . Then is also a gaussian random vector with ie, gaussian random vectors are closed under linear transformations.
see gaussian random vectors are closed under linear transformations
The two above facts means that a standard gaussian random vector is both an iid random vector and an orthogonally invariant random vector!
Theorem
Suppose is a standard gaussian random vector. For any , we have In particular, this does not depend on and is orthogonally invariant
Proof
For , we have (from gaussian random vectors are closed under linear transformations) that
Now, consider a special case where is the first row of (and we can find an orthogonal resulting via Gram-Schmidt). Then we see that the law is independent of as desired.
see standard gaussian random vectors are orthogonally invariant
Theorem
If , then
Proof
The result follows immediately from standard gaussian random vectors are orthogonally invariant and the independence proposition for orthogonally invariant distribution on the unit sphere.
1.3 Concentration of Gaussian Vector Norms
Important
The direction of a gaussian random vector is uniformly distributed.
- [?] So we’ve addressed the direction. What about the magnitudes?
If is a standard gaussian random vector, then
Since is the sum of iid random variables, then by the Law of Large Numbers and the central limit theorem, we expect that with high probability.
By using Markov’s inequality and Chebyshev’s inequality, we can get something close to this.
By Markov, we have
And by Chebyshev we get
\mathbb{P}[\lvert \lvert x \rvert \rvert ^2 \geq d + t] &= \mathbb{P}[\lvert \lvert x \rvert \rvert ^2 \geq \mathbb{E}[\,\lvert \lvert x \rvert \rvert ^2] + t] \\ & \leq \frac{\text{Var}(\lvert \lvert x \rvert \rvert ^2)}{t^2} \\ &= \frac{2d}{t^2} \end{align}$$ Where $\text{Var}[\lvert \lvert x \rvert \rvert^2] = d \text{Var}[x_{1}^2]=2d$. So this gives $\lvert \lvert x \rvert \rvert^2 = {\cal O}(\sqrt{ d })$ with high probability when $0 \leq t- \sqrt{ d } < \varepsilon$ This isn't *quite* what we want since both results above depend a lot on the value of $t$. > [!theorem] > > Let $x \sim {\cal N}(0, I_{d})$ be a standard [[gaussian random vector]] and $t \geq 0$. Then > $$\begin{align} > \mathbb{P}[\left\lvert \,\lvert \lvert x \rvert \rvert^2 -d \, \right\rvert \geq t] &= \mathbb{P}\left[ \left\lvert \sum_{i=1}^d x_{i}^2 - d \right\rvert \geq t \right] \\ > &\leq 2 \begin{cases} > \exp\left( -\frac{t^2}{8d} \right)&\text{if } t \leq d \\ > \exp\left( -\frac{t}{8} \right) & \text{if }t \geq d > \end{cases} \\ > &= 2\exp\left( -\frac{1}{8} \min \left\{ \frac{t^2}{d}, t \right\}\right) > \end{align}$$ > > [!NOTE] > > This proof uses the *Chernoff Method*, which is used to prove concentration inequalities like the [[Chebyshev's inequality|Chebyshev]], [Hoeffding](https://en.wikipedia.org/wiki/Hoeffding%27s_inequality), [Bernstein](https://en.wikipedia.org/wiki/Bernstein_inequalities_(probability_theory)), [Azuma-Hoeffding](https://en.wikipedia.org/wiki/Azuma%27s_inequality), and [McDiarmid](https://en.wikipedia.org/wiki/McDiarmid%27s_inequality). It is a more general form of a "nonlinear [[Markov's inequality|Markov]]" inequality > [!proof] > > We deal with only one side of the distribution; the desired inequality is achieved by simply multiplying by 2 to account for the other tail. > > Note that $\mathbb{E}[x_{i}^2] = 1$, so define $s_{i} = x_{i}^2-1$. Then > $$\begin{align} > \mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t\right] &= \mathbb{P}\left[ \sum_{i-1}^ds_{i} \geq t \right] \\ > &= \mathbb{P}\left[ \exp\left( \lambda \sum_{i=1}^d s_{i} \right) \geq \exp(\lambda t) \right] \\ > (*)& \leq \frac{\mathbb{E}\left[ \exp\left( \lambda \sum s_{i} \right) \right]}{\exp(\lambda t)} \\ > (**)& = \frac{\mathbb{E}[\exp(\lambda s_{1})]^d}{\exp(\lambda t)} \\ > &= \exp(-\lambda t + d \log(\mathbb{E}[\exp(\lambda s_{1})])) > \end{align}$$ > Where > - $(*)$ is from [[Markov's Inequality]] > - $(* *)$ is because the $x_{i}$ are independent > > Now, define > $$\psi(\lambda) := \log(\mathbb{E}[\exp(\lambda s_{1})]) = \log(\mathbb{E}[\exp(\lambda(x_{1}^2 - 1))])$$ > As the *cumulant generating function* ([wikipedia](https://en.wikipedia.org/wiki/Cumulant)) of $x_{1}^2-1$ (which has expectation 0). Note that $\mathbb{E}[\exp(\lambda x_{1}^2)]$ is finite if and only if $\lambda < \frac{1}{2}$ and in this case > $$\begin{align} > \mathbb{E}[\exp(\lambda x_{1}^2)] &= \frac{1}{\sqrt{ 1-2\lambda }} \\ > \implies \psi(\lambda) &= -\lambda + \frac{1}{2}\log\left( \frac{1}{1-2\lambda} \right) \\ > ({\dagger})&= \lambda^2+{\cal O}(\lambda) > \end{align}$$ > Where we get $({\dagger})$ via Taylor expansion. This yields the bound $\psi(\lambda) \leq 2\lambda^2$ for $\lambda \leq \frac{1}{4}$ > >[!exercise]- Exercise (tedious) - Verify the bound > >- Can plot $\psi(\lambda)$ and the bound $2\lambda^2$ (see notes page 6; figure 1.1) > >- Verify via algebra that $\psi(\lambda) \leq 2\lambda^2$ for all $\lambda \leq \frac{1}{4}$ > > > Assuming $\lambda \leq \frac{1}{4}$, we have > $$\begin{align} > \mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] &\leq \exp(-\lambda t + d \log(\mathbb{E}[\exp(\lambda s_{1})])) \\ > &=\exp(-\lambda t + d\psi(\lambda)) \\ > &\leq \exp(-\lambda t + 2d\lambda^2) > \end{align}$$ > Now, $2d\lambda^2 -\lambda t$ is a convex quadratic function of $\lambda$, and is thus minimized at $\lambda^* = \frac{t}{4d}$. We want to set $\lambda=\lambda^*$, but we must ensure that $\lambda \leq \frac{1}{4}$ to maintain our desired bound. Thus, we have 2 cases: > - If $t \leq d$, then $\lambda^* \leq \frac{1}{4}$ and we can safely set $\lambda = \lambda^*$. > $$\mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] \leq \exp(-\lambda t + 2d\lambda^2)=\exp\left( -\frac{t^2}{4d} \right)$$ > - If $t \geq d$, then $\frac{t}{4d} \not\leq \frac{1}{4}$, so we instead set $\lambda=\frac{1}{4}$. > $$\mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] \leq \exp(-\lambda t + 2d\lambda^2)=\exp\left( -\frac{t}{4}+\frac{d}{8} \right) \leq \exp\left( -\frac{t}{8} \right)$$ > Which is the desired result > $$\tag*{$\blacksquare$}$$ > see [[concentration inequality for magnitude of standard gaussian random vector]] > [!NOTE] > > The important part of the proof is finding some $\psi(\lambda) = {\cal O}(\lambda^2)$ for a bounded $\lambda$. This means that the random variance is *subexponential* (a weaker version of being *subgaussian*). > > This result is characteristic of sums of $d$ iid subexponential random variables. Sums of this type have "Gaussian tails" scaled by $\exp\left( -\frac{ct^2}{d} \right)$ up to a cutoff of $t \sim d$, after which they have "exponential tails" scaled by $\exp(-ct)$ > - [Bernstein's inequality](https://en.wikipedia.org/wiki/Bernstein_inequality) is general tool for expressing this type of behavior (see notes pg. 7 for more + reference) # Review #flashcards/math/rmt The $\text{Law}$ is the {==*distribution*==} of the random vector. <!--SR:!2026-08-28,221,270--> What does $\mathbb{S}^{d-1}(r)$ denote? -?- The surface of the sphere of radius $r$ in $d$ dimensions. <!--SR:!2026-04-11,127,250--> > [!important] > > The direction of a {==2||[[gaussian random vector]]==} is {==1||uniformly distributed on $\mathbb{S}^{d-1}$||law==} <!--SR:!2026-02-19,22,210!2026-05-11,146,250--> With high probability, the {==1||magnitude==} of a gaussian random vector is {==1||$\sqrt{ d }$==} <!--SR:!2026-02-17,74,210--> # TODO - [x] Add flashcards ⏳ 2025-09-11 ✅ 2025-09-11 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```