Random Matrix Lecture 01

[[lecture-data]]

:LiArrowBigLeftDash: = choice(this["previous_lecture"], this["previous_lecture"], "No Previous Lecture") | = choice(this["next_lecture"], this["next_lecture"], "No Next Lecture") :LiArrowBigRightDash:

 

Quote

Class notes pgs 2-7 (1.1 - 1.3)

Summary

  • Random vector theory
  • Properties of Gaussian random vectors
  • Concentration inequalities

1. Random Vector Theory

1.1 Natural Random Models and Orthogonal Invariance

Interpretations of vectors

  1. List of numbers
  2. Magnitude and direction (with respect to a basis of the vector space they belong to)

Corresponding interpretations for random vectors:

  1. Entries (each of the numbers) are as independent as possible

    • ie entries are iid from
  2. Magnitude and direction are as independent as possible

    • Take and to be independent
    • The magnitude is then any random non-negative scalar and the direction is
      • ie a random vector drawn uniformly from the unit sphere See random vector

Orthogonal Invariance

Let be a random vector. We say that (or its law ) is orthogonally invariant if for each (deterministic) orthogonal matrix we have

see also invariant

NOTE

The law is the distribution of the random vector. We sometimes use “model” instead of “law” or “distribution” to remind us that our choice includes assumptions and judgements.

  • We are modelling a situation we might encounter in applications

Proposition

There exists a unique probability measure supported on that is orthogonally invariant. We denote this

see orthogonally invariant distribution on the unit sphere

Proposition

Suppose that is orthogonally invariant. Then

  • is independent of
  • .
  • [p] Entry-wise interpretation of random vectors is described in these distribution definitions
  • [p] magnitude/direction interpretation is also described
  • [c] It is hard to swap between the two interpretations

Question

  • What do the entries look like for a vector ?
  • What are the magnitude and direction of an iid random vector ?

1.2 Gaussian Random Vectors

The most natural of the iid random vectors is the multivariate Gaussian

Multivariate Normal

The multivariate normal or Gaussian for parameters and (positive semidefinite) is the probability measure with density

With respect to the Lebesgue measure on the row space of and where

NOTE

When is invertible,

Gaussian Random Vector

If a random vector , we call a Gaussian random vector. If , then we call a standard Gaussian random vector.

see gaussian random vector

Proposition

Let be a gaussian random vector. Then is the mean vector and is the covariance matrix of and

ie, the law of a gaussian random vector is determined by its mean and covariance (or its linear and quadratic moments)

Theorem

Let be a gaussian random vector, , and . Then is also a gaussian random vector with ie, gaussian random vectors are closed under linear transformations.

see gaussian random vectors are closed under linear transformations

The two above facts means that a standard gaussian random vector is both an iid random vector and an orthogonally invariant random vector!

Theorem

Suppose is a standard gaussian random vector. For any , we have In particular, this does not depend on and is orthogonally invariant

Proof

For , we have (from gaussian random vectors are closed under linear transformations) that

Now, consider a special case where is the first row of (and we can find an orthogonal resulting via Gram-Schmidt). Then we see that the law is independent of as desired.

see standard gaussian random vectors are orthogonally invariant

Theorem

If , then

Proof

The result follows immediately from standard gaussian random vectors are orthogonally invariant and the independence proposition for orthogonally invariant distribution on the unit sphere.

see normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere

1.3 Concentration of Gaussian Vector Norms

Important

The direction of a gaussian random vector is uniformly distributed.

  • [?] So we’ve addressed the direction. What about the magnitudes?

If is a standard gaussian random vector, then

Since is the sum of iid random variables, then by the Law of Large Numbers and the central limit theorem, we expect that with high probability.

By using Markov’s inequality and Chebyshev’s inequality, we can get something close to this.

By Markov, we have

And by Chebyshev we get

\mathbb{P}[\lvert \lvert x \rvert \rvert ^2 \geq d + t] &= \mathbb{P}[\lvert \lvert x \rvert \rvert ^2 \geq \mathbb{E}[\,\lvert \lvert x \rvert \rvert ^2] + t] \\ & \leq \frac{\text{Var}(\lvert \lvert x \rvert \rvert ^2)}{t^2} \\ &= \frac{2d}{t^2} \end{align}$$ Where $\text{Var}[\lvert \lvert x \rvert \rvert^2] = d \text{Var}[x_{1}^2]=2d$. So this gives $\lvert \lvert x \rvert \rvert^2 = {\cal O}(\sqrt{ d })$ with high probability when $0 \leq t- \sqrt{ d } < \varepsilon$ This isn't *quite* what we want since both results above depend a lot on the value of $t$. > [!theorem] > > Let $x \sim {\cal N}(0, I_{d})$ be a standard [[gaussian random vector]] and $t \geq 0$. Then > $$\begin{align} > \mathbb{P}[\left\lvert \,\lvert \lvert x \rvert \rvert^2 -d \, \right\rvert \geq t] &= \mathbb{P}\left[ \left\lvert \sum_{i=1}^d x_{i}^2 - d \right\rvert \geq t \right] \\ > &\leq 2 \begin{cases} > \exp\left( -\frac{t^2}{8d} \right)&\text{if } t \leq d \\ > \exp\left( -\frac{t}{8} \right) & \text{if }t \geq d > \end{cases} \\ > &= 2\exp\left( -\frac{1}{8} \min \left\{ \frac{t^2}{d}, t \right\}\right) > \end{align}$$ > > [!NOTE] > > This proof uses the *Chernoff Method*, which is used to prove concentration inequalities like the [[Chebyshev's inequality|Chebyshev]], [Hoeffding](https://en.wikipedia.org/wiki/Hoeffding%27s_inequality), [Bernstein](https://en.wikipedia.org/wiki/Bernstein_inequalities_(probability_theory)), [Azuma-Hoeffding](https://en.wikipedia.org/wiki/Azuma%27s_inequality), and [McDiarmid](https://en.wikipedia.org/wiki/McDiarmid%27s_inequality). It is a more general form of a "nonlinear [[Markov's inequality|Markov]]" inequality > [!proof] > > We deal with only one side of the distribution; the desired inequality is achieved by simply multiplying by 2 to account for the other tail. > > Note that $\mathbb{E}[x_{i}^2] = 1$, so define $s_{i} = x_{i}^2-1$. Then > $$\begin{align} > \mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t\right] &= \mathbb{P}\left[ \sum_{i-1}^ds_{i} \geq t \right] \\ > &= \mathbb{P}\left[ \exp\left( \lambda \sum_{i=1}^d s_{i} \right) \geq \exp(\lambda t) \right] \\ > (*)& \leq \frac{\mathbb{E}\left[ \exp\left( \lambda \sum s_{i} \right) \right]}{\exp(\lambda t)} \\ > (**)& = \frac{\mathbb{E}[\exp(\lambda s_{1})]^d}{\exp(\lambda t)} \\ > &= \exp(-\lambda t + d \log(\mathbb{E}[\exp(\lambda s_{1})])) > \end{align}$$ > Where > - $(*)$ is from [[Markov's Inequality]] > - $(* *)$ is because the $x_{i}$ are independent > > Now, define > $$\psi(\lambda) := \log(\mathbb{E}[\exp(\lambda s_{1})]) = \log(\mathbb{E}[\exp(\lambda(x_{1}^2 - 1))])$$ > As the *cumulant generating function* ([wikipedia](https://en.wikipedia.org/wiki/Cumulant)) of $x_{1}^2-1$ (which has expectation 0). Note that $\mathbb{E}[\exp(\lambda x_{1}^2)]$ is finite if and only if $\lambda < \frac{1}{2}$ and in this case > $$\begin{align} > \mathbb{E}[\exp(\lambda x_{1}^2)] &= \frac{1}{\sqrt{ 1-2\lambda }} \\ > \implies \psi(\lambda) &= -\lambda + \frac{1}{2}\log\left( \frac{1}{1-2\lambda} \right) \\ > ({\dagger})&= \lambda^2+{\cal O}(\lambda) > \end{align}$$ > Where we get $({\dagger})$ via Taylor expansion. This yields the bound $\psi(\lambda) \leq 2\lambda^2$ for $\lambda \leq \frac{1}{4}$ > >[!exercise]- Exercise (tedious) - Verify the bound > >- Can plot $\psi(\lambda)$ and the bound $2\lambda^2$ (see notes page 6; figure 1.1) > >- Verify via algebra that $\psi(\lambda) \leq 2\lambda^2$ for all $\lambda \leq \frac{1}{4}$ > > > Assuming $\lambda \leq \frac{1}{4}$, we have > $$\begin{align} > \mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] &\leq \exp(-\lambda t + d \log(\mathbb{E}[\exp(\lambda s_{1})])) \\ > &=\exp(-\lambda t + d\psi(\lambda)) \\ > &\leq \exp(-\lambda t + 2d\lambda^2) > \end{align}$$ > Now, $2d\lambda^2 -\lambda t$ is a convex quadratic function of $\lambda$, and is thus minimized at $\lambda^* = \frac{t}{4d}$. We want to set $\lambda=\lambda^*$, but we must ensure that $\lambda \leq \frac{1}{4}$ to maintain our desired bound. Thus, we have 2 cases: > - If $t \leq d$, then $\lambda^* \leq \frac{1}{4}$ and we can safely set $\lambda = \lambda^*$. > $$\mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] \leq \exp(-\lambda t + 2d\lambda^2)=\exp\left( -\frac{t^2}{4d} \right)$$ > - If $t \geq d$, then $\frac{t}{4d} \not\leq \frac{1}{4}$, so we instead set $\lambda=\frac{1}{4}$. > $$\mathbb{P}\left[ \sum_{i=1}^d x_{i}^2 - d \geq t \right] \leq \exp(-\lambda t + 2d\lambda^2)=\exp\left( -\frac{t}{4}+\frac{d}{8} \right) \leq \exp\left( -\frac{t}{8} \right)$$ > Which is the desired result > $$\tag*{$\blacksquare$}$$ > see [[concentration inequality for magnitude of standard gaussian random vector]] > [!NOTE] > > The important part of the proof is finding some $\psi(\lambda) = {\cal O}(\lambda^2)$ for a bounded $\lambda$. This means that the random variance is *subexponential* (a weaker version of being *subgaussian*). > > This result is characteristic of sums of $d$ iid subexponential random variables. Sums of this type have "Gaussian tails" scaled by $\exp\left( -\frac{ct^2}{d} \right)$ up to a cutoff of $t \sim d$, after which they have "exponential tails" scaled by $\exp(-ct)$ > - [Bernstein's inequality](https://en.wikipedia.org/wiki/Bernstein_inequality) is general tool for expressing this type of behavior (see notes pg. 7 for more + reference) # Review #flashcards/math/rmt The $\text{Law}$ is the {==*distribution*==} of the random vector. <!--SR:!2026-08-28,221,270--> What does $\mathbb{S}^{d-1}(r)$ denote? -?- The surface of the sphere of radius $r$ in $d$ dimensions. <!--SR:!2026-04-11,127,250--> > [!important] > > The direction of a {==2||[[gaussian random vector]]==} is {==1||uniformly distributed on $\mathbb{S}^{d-1}$||law==} <!--SR:!2026-02-19,22,210!2026-05-11,146,250--> With high probability, the {==1||magnitude==} of a gaussian random vector is {==1||$\sqrt{ d }$==} <!--SR:!2026-02-17,74,210--> # TODO - [x] Add flashcards ⏳ 2025-09-11 ✅ 2025-09-11 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```