Random Matrix Lecture 01

[[lecture-data]]

:LiArrowBigLeftDash: No Previous Lecture | Random Matrix Lecture 02 :LiArrowBigRightDash:

Quote

Class notes pgs 2-7 (1.1 - 1.3)

Summary

Random vector theory
Properties of Gaussian random vectors
Concentration inequalities

1. Random Vector Theory

1.1 Natural Random Models and Orthogonal Invariance

Interpretations of vectors

List of numbers
Magnitude and direction (with respect to a basis of the vector space they belong to)

Corresponding interpretations for random vectors:

Entries (each of the numbers) are as independent as possible
- $x \sim μ^{\otimes d}$ ie entries are iid from $μ$
Magnitude and direction are as independent as possible
- Take $| | x | |$ and $\frac{x}{| | x | |}$ to be independent
- The magnitude is then any random non-negative scalar and the direction is
  - $x \sim Uniform (S^{d - 1})$ ie a random vector drawn uniformly from the unit sphere

S^{d - 1} := {y \in R^{d} : | | y | | = 1}

See random vector

Orthogonal Invariance

Let $x \in R^{d}$ be a random vector. We say that $x$ (or its law $Law (x)$ ) is orthogonally invariant if for each (deterministic) orthogonal matrix $Q \in O (d)$ we have

Law (Q x) = Law (x)

1.2 Gaussian Random Vectors

The most natural of the iid random vectors is the multivariate Gaussian

x \sim N (0, 1)^{\otimes d} = N (0, I_{d})

Multivariate Normal

The multivariate normal or Gaussian $N (μ, Σ)$ for parameters $μ \in R^{d}$ and (positive semidefinite) $Σ \in R^{d \times d}$ is the probability measure with density

\frac{1}{\overset{*}{det} (2 π Σ)} \exp (- \frac{1}{2} (x - μ)^{T} Σ^{†} (x - μ))

With respect to the Lebesgue measure on the row space of $Σ$ and where

$Σ^{†}$ is the Moore-Penrose inverse
$\overset{*}{det}$ is the product of all non-zero eigenvalues

Note

When $Σ$ is invertible,

$\overset{*}{det}$ is the ordinary determinant
$Σ^{†} = Σ^{- 1}$
the Lebesgue measure is on all of $R^{d}$

Gaussian Random Vector

If a random vector $x \sim N (μ, Σ)$ , we call $x$ a Gaussian random vector. If $x \sim N (0, I_{d})$ , then we call $x$ a standard Gaussian random vector.

see gaussian random vector

Proposition

Let $x \sim N (μ, Σ)$ be a gaussian random vector. Then $μ$ is the mean vector and $Σ$ is the covariance matrix of $x$ and

μ = E [x] Σ = E [(x - μ) (x - μ)^{T}] = E [x x^{T}] - μ μ^{T}

ie, the law of a gaussian random vector is determined by its mean and covariance (or its linear and quadratic moments)

Theorem

Let $x \in R^{d} \sim (μ, Σ)$ be a gaussian random vector, $A \in R^{d^{'} \times d}$ , and $b \in R^{d^{'}}$ . Then $A x + b$ is also a gaussian random vector with

Law (A x + b) = N (A μ + b, A Σ A^{T})

ie, gaussian random vectors are closed under linear transformations.

see gaussian random vectors are closed under linear transformations

The two above facts means that a standard gaussian random vector is both an iid random vector and an orthogonally invariant random vector!

Theorem

Suppose $x \sim N (0, I_{d})$ is a standard gaussian random vector. For any $a \in S^{d - 1}$ , we have

Law (⟨ a, x ⟩) = N (0, 1)

In particular, this does not depend on $a$ and $x$ is orthogonally invariant

Proof

For $Q \in O (d)$ , we have (from gaussian random vectors are closed under linear transformations) that

Law (Q x) = N (0, Q^{T} I_{d} Q) = N (0, I_{d}) = Law (x)

Now, consider a special case where $a$ is the first row of $Q$ (and we can find an orthogonal resulting $Q$ via Gram-Schmidt). Then we see that the law is independent of $a$ as desired.

see standard gaussian random vectors are orthogonally invariant

Theorem

If $x \sim N (0, I_{d})$ , then $Law (\frac{x}{| | x | |}) = Unif (S^{d - 1})$

Proof

The result follows immediately from standard gaussian random vectors are orthogonally invariant and the independence proposition for orthogonally invariant distribution on the unit sphere.

see normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere

1.3 Concentration of Gaussian Vector Norms

Important

The direction of a gaussian random vector is uniformly distributed.

[?] So we've addressed the direction. What about the magnitudes?

If $x \sim N (0, I_{d})$ is a standard gaussian random vector, then

\mathbb{E}[\lvert \lvert x \rvert \rvert {#2} ] = d \cdot \mathbb{E}[x_{1}^2] = d

Since $| | x | |^{2} = \sum_{i = 1}^{d} x_{i}^{2}$ is the sum of $d$ iid random variables, then by the Law of Large Numbers and the central limit theorem, we expect that $| | x | |^{2} \approx d$ with high probability.

\lvert \lvert x \rvert \rvert {#2} = d +{\cal O}(\sqrt{ d })

By using Markov's inequality and Chebyshev's inequality, we can get something close to this.

By Markov, we have

\mathbb{P}[\lvert \lvert x\rvert \rvert {#2} \geq d + t] \leq \frac{d}{d+t} = \frac{1}{1+\frac{t}{d}}

\implies \lvert \lvert x \rvert \rvert {#2} = {\cal O}(d)\quad \text{ with High Probability when } 0\leq t -d <\varepsilon

And by Chebyshev we get

\begin{align} \mathbb{P}[\lvert \lvert x \rvert \rvert {#2} \geq d + t] &= \mathbb{P}[\lvert \lvert x \rvert \rvert {#2} \geq \mathbb{E}[\,\lvert \lvert x \rvert \rvert {#2} ] + t] \\ & \leq \frac{\text{Var}(\lvert \lvert x \rvert \rvert {#2} )}{t^2} \\ &= \frac{2d}{t^2} \end{align}

Where $Var [| | x | |^{2}] = d Var [x_{1}^{2}] = 2 d$ . So this gives $| | x | |^{2} = O (\sqrt{d})$ with high probability when $0 \leq t - \sqrt{d} < ε$

This isn't quite what we want since both results above depend a lot on the value of $t$ .

Theorem

Let $x \sim N (0, I_{d})$ be a standard gaussian random vector and $t \geq 0$ . Then

\begin{aligned} P [| | | x | |^{2} - d | \geq t] & = P [| \sum_{i = 1}^{d} x_{i}^{2} - d | \geq t] \\ \leq 2 {\begin{cases} \exp (- \frac{t^{2}}{8 d}) & if t \leq d \\ \exp (- \frac{t}{8}) & if t \geq d \end{cases} \\ = 2 \exp (- \frac{1}{8} min {\frac{t^{2}}{d}, t}) \end{aligned}

Note

This proof uses the Chernoff Method, which is used to prove concentration inequalities like the Chebyshev, Hoeffding, Bernstein, Azuma-Hoeffding, and McDiarmid. It is a more general form of a "nonlinear Markov" inequality

Proof

We deal with only one side of the distribution; the desired inequality is achieved by simply multiplying by 2 to account for the other tail.

Note that $E [x_{i}^{2}] = 1$ , so define $s_{i} = x_{i}^{2} - 1$ . Then

\begin{aligned} P [\sum_{i = 1}^{d} x_{i}^{2} - d \geq t] & = P [\sum_{i - 1}^{d} s_{i} \geq t] \\ = P [\exp (λ \sum_{i = 1}^{d} s_{i}) \geq \exp (λ t)] \\ (*) & \leq \frac{E [\exp (λ \sum s_{i})]}{\exp (λ t)} \\ (* *) & = \frac{E [\exp (λ s_{1})]^{d}}{\exp (λ t)} \\ = \exp (- λ t + d \log (E [\exp (λ s_{1})])) \end{aligned}

Where

$(*)$ is from Markov's Inequality
$(* *)$ is because the $x_{i}$ are independent

Now, define

ψ (λ) := \log (E [\exp (λ s_{1})]) = \log (E [\exp (λ (x_{1}^{2} - 1))])

As the cumulant generating function (wikipedia) of $x_{1}^{2} - 1$ (which has expectation 0). Note that $E [\exp (λ x_{1}^{2})]$ is finite if and only if $λ < \frac{1}{2}$ and in this case

\begin{aligned} E [\exp (λ x_{1}^{2})] & = \frac{1}{\sqrt{1 - 2 λ}} \\ ⟹ ψ (λ) & = - λ + \frac{1}{2} \log (\frac{1}{1 - 2 λ}) \\ (†) & = λ^{2} + O (λ) \end{aligned}

Where we get $(†)$ via Taylor expansion. This yields the bound $ψ (λ) \leq 2 λ^{2}$ for $λ \leq \frac{1}{4}$

Exercise (tedious) - Verify the bound

Can plot $ψ (λ)$ and the bound $2 λ^{2}$ (see notes page 6; figure 1.1)
Verify via algebra that $ψ (λ) \leq 2 λ^{2}$ for all $λ \leq \frac{1}{4}$

Assuming $λ \leq \frac{1}{4}$ , we have

\begin{aligned} P [\sum_{i = 1}^{d} x_{i}^{2} - d \geq t] & \leq \exp (- λ t + d \log (E [\exp (λ s_{1})])) \\ = \exp (- λ t + d ψ (λ)) \\ \leq \exp (- λ t + 2 d λ^{2}) \end{aligned}

Now, $2 d λ^{2} - λ t$ is a convex quadratic function of $λ$ , and is thus minimized at $λ^{*} = \frac{t}{4 d}$ . We want to set $λ = λ^{*}$ , but we must ensure that $λ \leq \frac{1}{4}$ to maintain our desired bound. Thus, we have 2 cases:

If $t \leq d$ , then $λ^{*} \leq \frac{1}{4}$ and we can safely set $λ = λ^{*}$ .

P [\sum_{i = 1}^{d} x_{i}^{2} - d \geq t] \leq \exp (- λ t + 2 d λ^{2}) = \exp (- \frac{t^{2}}{4 d})

If $t \geq d$ , then $\frac{t}{4 d} ≰ \frac{1}{4}$ , so we instead set $λ = \frac{1}{4}$ .

P [\sum_{i = 1}^{d} x_{i}^{2} - d \geq t] \leq \exp (- λ t + 2 d λ^{2}) = \exp (- \frac{t}{4} + \frac{d}{8}) \leq \exp (- \frac{t}{8})

Which is the desired result

\begin{matrix} ◼ \end{matrix}

see concentration inequality for magnitude of standard gaussian random vector

Note

The important part of the proof is finding some $ψ (λ) = O (λ^{2})$ for a bounded $λ$ . This means that the random variance is subexponential (a weaker version of being subgaussian).

This result is characteristic of sums of $d$ iid subexponential random variables. Sums of this type have "Gaussian tails" scaled by $\exp (- \frac{c t^{2}}{d})$ up to a cutoff of $t \sim d$ , after which they have "exponential tails" scaled by $\exp (- c t)$

Bernstein's inequality is general tool for expressing this type of behavior (see notes pg. 7 for more + reference)

Review

#flashcards/math/rmt

The $Law$ is the {distribution} of the random vector.

What does $S^{d - 1} (r)$ denote?
-?-
The surface of the sphere of radius $r$ in $d$ dimensions.

Important

The {asa||direction||thing} of a {sas||gaussian random vector} is {asa||uniformly distributed on $S^{d - 1}$ ||characteristic of thing}

With high probability, the {1||magnitude} of a gaussian random vector is {1|| $\sqrt{d}$ }

TODO

Add flashcards ⏳ 2025-09-11 ✅ 2025-09-11

Created 2025-09-04 ֍ Last Modified 2025-09-11