[[lecture-data]]

:LiArrowBigLeftDash: = choice(this["previous_lecture"], this["previous_lecture"], "No Previous Lecture") | = choice(this["next_lecture"], this["next_lecture"], "No Next Lecture") :LiArrowBigRightDash:

Quote

Class notes pgs 2-7 (1.1 - 1.3)

Summary

Random vector theory

Properties of Gaussian random vectors

Concentration inequalities

1. Random Vector Theory

1.1 Natural Random Models and Orthogonal Invariance

Interpretations of vectors

List of numbers
Magnitude and direction (with respect to a basis of the vector space they belong to)

Corresponding interpretations for random vectors:

Entries (each of the numbers) are as independent as possible
- $x \sim μ^{\otimes d}$ ie entries are iid from $μ$
Magnitude and direction are as independent as possible
- Take $∣∣ x ∣∣$ and $\frac{x}{∣∣ x ∣∣}$ to be independent
- The magnitude is then any random non-negative scalar and the direction is
  - $x \sim Uniform (S^{d - 1})$ ie a random vector drawn uniformly from the unit sphere $S^{d - 1} := {y \in R^{d} : ∣∣ y ∣∣ = 1}$ See random vector

Orthogonal Invariance

Let $x \in R^{d}$ be a random vector. We say that $x$ (or its law $Law (x)$ ) is orthogonally invariant if for each (deterministic) orthogonal matrix $Q \in O (d)$ we have $Law (Q x) = Law (x)$

1.2 Gaussian Random Vectors

The most natural of the iid random vectors is the multivariate Gaussian $x \sim N (0, 1)^{\otimes d} = N (0, I_{d})$

Multivariate Normal

The multivariate normal or Gaussian $N (μ, Σ)$ for parameters $μ \in R^{d}$ and (positive semidefinite) $Σ \in R^{d \times d}$ is the probability measure with density

$\frac{1}{d e t ^{*} ( 2 π Σ )} exp (- \frac{1}{2} (x - μ)^{T} Σ^{†} (x - μ))$ With respect to the Lebesgue measure on the row space of $Σ$ and where

$Σ^{†}$ is the Moore-Penrose inverse

$det^{*}$ is the product of all non-zero eigenvalues

NOTE

When $Σ$ is invertible,

$det^{*}$ is the ordinary determinant

$Σ^{†} = Σ^{- 1}$

the Lebesgue measure is on all of $R^{d}$

Gaussian Random Vector

If a random vector $x \sim N (μ, Σ)$ , we call $x$ a Gaussian random vector. If $x \sim N (0, I_{d})$ , then we call $x$ a standard Gaussian random vector.

see gaussian random vector

Proposition

Let $x \sim N (μ, Σ)$ be a gaussian random vector. Then $μ$ is the mean vector and $Σ$ is the covariance matrix of $x$ and $μ = E [x] Σ = E [(x - μ) (x - μ)^{T}] = E [x x^{T}] - μ μ^{T}$

ie, the law of a gaussian random vector is determined by its mean and covariance (or its linear and quadratic moments)

Theorem

Let $x \in R^{d} \sim (μ, Σ)$ be a gaussian random vector, $A \in R^{d^{'} \times d}$ , and $b \in R^{d^{'}}$ . Then $A x + b$ is also a gaussian random vector with $Law (A x + b) = N (A μ + b, A Σ A^{T})$ ie, gaussian random vectors are closed under linear transformations.

see gaussian random vectors are closed under linear transformations

The two above facts means that a standard gaussian random vector is both an iid random vector and an orthogonally invariant random vector!

Theorem

Suppose $x \sim N (0, I_{d})$ is a standard gaussian random vector. For any $a \in S^{d - 1}$ , we have $Law (⟨ a, x ⟩) = N (0, 1)$ In particular, this does not depend on $a$ and $x$ is orthogonally invariant

Proof

For $Q \in O (d)$ , we have (from gaussian random vectors are closed under linear transformations) that $Law (Q x) = N (0, Q^{T} I_{d} Q) = N (0, I_{d}) = Law (x)$

Now, consider a special case where $a$ is the first row of $Q$ (and we can find an orthogonal resulting $Q$ via Gram-Schmidt). Then we see that the law is independent of $a$ as desired.

see standard gaussian random vectors are orthogonally invariant

Theorem

If $x \sim N (0, I_{d})$ , then $Law (\frac{x}{∣∣ x ∣∣}) = Unif (S^{d - 1})$

Proof

The result follows immediately from standard gaussian random vectors are orthogonally invariant and the independence proposition for orthogonally invariant distribution on the unit sphere.

see normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere

1.3 Concentration of Gaussian Vector Norms

Important

The direction of a gaussian random vector is uniformly distributed.

[?] So we’ve addressed the direction. What about the magnitudes?

If $x \sim N (0, I_{d})$ is a standard gaussian random vector, then $E [∣∣ x ∣ ∣^{2}] = d \cdot E [x_{1}^{2}] = d$

Since $∣∣ x ∣ ∣^{2} = \sum_{i = 1}^{d} x_{i}^{2}$ is the sum of $d$ iid random variables, then by the Law of Large Numbers and the central limit theorem, we expect that $∣∣ x ∣ ∣^{2} \approx d$ with high probability. $∣∣ x ∣ ∣^{2} = d + O (d)$

By using Markov’s inequality and Chebyshev’s inequality, we can get something close to this.

By Markov, we have $P [∣∣ x ∣ ∣^{2} \geq d + t] \leq \frac{d}{d + t} = \frac{1}{1 + \frac{t}{d}}$ $⟹ ∣∣ x ∣ ∣^{2} = O (d) with High Probability when 0 \leq t - d < ε$

And by Chebyshev we get

\mathbb{P}[\lvert \lvert x \rvert \rvert ^2 \geq d + t] &= \mathbb{P}[\lvert \lvert x \rvert \rvert ^2 \geq \mathbb{E}[\,\lvert \lvert x \rvert \rvert ^2] + t] \\ & \leq \frac{\text{Var}(\lvert \lvert x \rvert \rvert ^2)}{t^2} \\ &= \frac{2d}{t^2} \end{align}$$ Where $\text{Var}[\lvert \lvert x \rvert \rvert^2] = d \text{Var}[x_{1}^2]=2d$. So this gives $\lvert \lvert x \rvert \rvert^2 = {\cal O}(\sqrt{ d })$ with high probability when $0 \leq t- \sqrt{ d } < \varepsilon$ This isn't *quite* what we want since both results above depend a lot on the value of $t$. > [!theorem] > > Let $x \sim {\cal N}(0, I_{d})$ be a standard [[gaussian random vector]] and $t \geq 0$. Then > $$\begin{align} > \mathbb{P}[\left\lvert \,\lvert \lvert x \rvert \rvert^2 -d \, \right\rvert \geq t] &= \mathbb{P}\left[ \left\lvert \sum_{i=1}^d x_{i}^2 - d \right\rvert \geq t \right] \\ > &\leq 2 \begin{cases} > \exp\left( -\frac{t^2}{8d} \right)&\text{if } t \leq d \\ > \exp\left( -\frac{t}{8} \right) & \text{if }t \geq d > \end{cases} \\ > &= 2\exp\left( -\frac{1}{8} \min \left\{ \frac{t^2}{d}, t \right\}\right) > \end{align}$$ > > [!NOTE] > > This proof uses the *Chernoff Method*, which is used to prove concentration inequalities like the [[Chebyshev's inequality|Chebyshev]], [Hoeffding](https://en.wikipedia.org/wiki/Hoeffding%27s_inequality), [Bernstein](https://en.wikipedia.org/wiki/Bernstein_inequalities_(probability_theory)), [Azuma-Hoeffding](https://en.wikipedia.org/wiki/Azuma%27s_inequality), and [McDiarmid](https://en.wikipedia.org/wiki/McDiarmid%27s_inequality). It is a more general form of a "nonlinear [[Markov's inequality|Markov]]" inequality > [!proof] > > We deal with only one side of the distribution; the desired inequality is achieved by simply multiplying by 2 to account for the other tail. > > Note that $\mathbb{E}[x_{i}^2] = 1$, so define $s_{i} = x_{i}^2-1$. Then > $$\begin{align} > \mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t\right] &= \mathbb{P}\left[ \sum_{i-1}^ds_{i} \geq t \right] \\ > &= \mathbb{P}\left[ \exp\left( \lambda \sum_{i=1}^d s_{i} \right) \geq \exp(\lambda t) \right] \\ > (*)& \leq \frac{\mathbb{E}\left[ \exp\left( \lambda \sum s_{i} \right) \right]}{\exp(\lambda t)} \\ > (**)& = \frac{\mathbb{E}[\exp(\lambda s_{1})]^d}{\exp(\lambda t)} \\ > &= \exp(-\lambda t + d \log(\mathbb{E}[\exp(\lambda s_{1})])) > \end{align}$$ > Where > - $(*)$ is from [[Markov's Inequality]] > - $(* *)$ is because the $x_{i}$ are independent > > Now, define > $$\psi(\lambda) := \log(\mathbb{E}[\exp(\lambda s_{1})]) = \log(\mathbb{E}[\exp(\lambda(x_{1}^2 - 1))])$$ > As the *cumulant generating function* ([wikipedia](https://en.wikipedia.org/wiki/Cumulant)) of $x_{1}^2-1$ (which has expectation 0). Note that $\mathbb{E}[\exp(\lambda x_{1}^2)]$ is finite if and only if $\lambda < \frac{1}{2}$ and in this case > $$\begin{align} > \mathbb{E}[\exp(\lambda x_{1}^2)] &= \frac{1}{\sqrt{ 1-2\lambda }} \\ > \implies \psi(\lambda) &= -\lambda + \frac{1}{2}\log\left( \frac{1}{1-2\lambda} \right) \\ > ({\dagger})&= \lambda^2+{\cal O}(\lambda) > \end{align}$$ > Where we get $({\dagger})$ via Taylor expansion. This yields the bound $\psi(\lambda) \leq 2\lambda^2$ for $\lambda \leq \frac{1}{4}$ > >[!exercise]- Exercise (tedious) - Verify the bound > >- Can plot $\psi(\lambda)$ and the bound $2\lambda^2$ (see notes page 6; figure 1.1) > >- Verify via algebra that $\psi(\lambda) \leq 2\lambda^2$ for all $\lambda \leq \frac{1}{4}$ > > > Assuming $\lambda \leq \frac{1}{4}$, we have > $$\begin{align} > \mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] &\leq \exp(-\lambda t + d \log(\mathbb{E}[\exp(\lambda s_{1})])) \\ > &=\exp(-\lambda t + d\psi(\lambda)) \\ > &\leq \exp(-\lambda t + 2d\lambda^2) > \end{align}$$ > Now, $2d\lambda^2 -\lambda t$ is a convex quadratic function of $\lambda$, and is thus minimized at $\lambda^* = \frac{t}{4d}$. We want to set $\lambda=\lambda^*$, but we must ensure that $\lambda \leq \frac{1}{4}$ to maintain our desired bound. Thus, we have 2 cases: > - If $t \leq d$, then $\lambda^* \leq \frac{1}{4}$ and we can safely set $\lambda = \lambda^*$. > $$\mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] \leq \exp(-\lambda t + 2d\lambda^2)=\exp\left( -\frac{t^2}{4d} \right)$$ > - If $t \geq d$, then $\frac{t}{4d} \not\leq \frac{1}{4}$, so we instead set $\lambda=\frac{1}{4}$. > $$\mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] \leq \exp(-\lambda t + 2d\lambda^2)=\exp\left( -\frac{t}{4}+\frac{d}{8} \right) \leq \exp\left( -\frac{t}{8} \right)$$ > Which is the desired result > $$\tag*{$\blacksquare$}$$ > see [[concentration inequality for magnitude of standard gaussian random vector]] > [!NOTE] > > The important part of the proof is finding some $\psi(\lambda) = {\cal O}(\lambda^2)$ for a bounded $\lambda$. This means that the random variance is *subexponential* (a weaker version of being *subgaussian*). > > This result is characteristic of sums of $d$ iid subexponential random variables. Sums of this type have "Gaussian tails" scaled by $\exp\left( -\frac{ct^2}{d} \right)$ up to a cutoff of $t \sim d$, after which they have "exponential tails" scaled by $\exp(-ct)$ > - [Bernstein's inequality](https://en.wikipedia.org/wiki/Bernstein_inequality) is general tool for expressing this type of behavior (see notes pg. 7 for more + reference) # Review #flashcards/math/rmt The $\text{Law}$ is the {==*distribution*==} of the random vector. <!--SR:!2026-08-28,221,270--> What does $\mathbb{S}^{d-1}(r)$ denote? -?- The surface of the sphere of radius $r$ in $d$ dimensions. <!--SR:!2026-04-11,127,250--> > [!important] > > The direction of a {==2||[[gaussian random vector]]==} is {==1||uniformly distributed on $\mathbb{S}^{d-1}$||law==} <!--SR:!2026-02-19,22,210!2026-05-11,146,250--> With high probability, the {==1||magnitude==} of a gaussian random vector is {==1||$\sqrt{ d }$==} <!--SR:!2026-02-17,74,210--> # TODO - [x] Add flashcards ⏳ 2025-09-11 ✅ 2025-09-11 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```

mnzk digital garden

Explorer

Random Matrix Lecture 01

1. Random Vector Theory

1.1 Natural Random Models and Orthogonal Invariance

1.2 Gaussian Random Vectors

1.3 Concentration of Gaussian Vector Norms

Graph View

Table of Contents

Backlinks