Random Matrix Lecture 01

[[lecture-data]]

:LiArrowBigLeftDash: No Previous Lecture | Random Matrix Lecture 02 :LiArrowBigRightDash:

Quote

Class notes pgs 2-7 (1.1 - 1.3)

Summary

  • Random vector theory
  • Properties of Gaussian random vectors
  • Concentration inequalities

1. Random Vector Theory

1.1 Natural Random Models and Orthogonal Invariance

Interpretations of vectors

  1. List of numbers
  2. Magnitude and direction (with respect to a basis of the vector space they belong to)

Corresponding interpretations for random vectors:

  1. Entries (each of the numbers) are as independent as possible

    • xμd ie entries are iid from μ
  2. Magnitude and direction are as independent as possible

    • Take ||x|| and x||x|| to be independent
    • The magnitude is then any random non-negative scalar and the direction is
      • xUniform(Sd1) ie a random vector drawn uniformly from the unit sphere
Sd1:={yRd:||y||=1}

See random vector

Orthogonal Invariance

Let xRd be a random vector. We say that x (or its law Law(x)) is orthogonally invariant if for each (deterministic) orthogonal matrix QO(d) we have

Law(Qx)=Law(x)

see also invariant

Note

The law is the distribution of the random vector.
We sometimes use "model" instead of "law" or "distribution" to remind us that our choice includes assumptions and judgements.

  • We are modelling a situation we might encounter in applications
Proposition

There exists a unique probability measure supported on Sd1 that is orthogonally invariant. We denote this Uniform(Sd1)

see orthogonally invariant distribution on the unit sphere

Proposition

Suppose that x is orthogonally invariant. Then

  • ||x|| is independent of x||x||
  • Law(x||x||)=Uniform(Sd1).
Question

  • What do the entries look like for a vector xUniform(Sd1) ?
  • What are the magnitude and direction of an iid random vector ?

1.2 Gaussian Random Vectors

The most natural of the iid random vectors is the multivariate Gaussian

xN(0,1)d=N(0,Id)
Multivariate Normal

The multivariate normal or Gaussian N(μ,Σ) for parameters μRd and (positive semidefinite) ΣRd×d is the probability measure with density

1det(2πΣ)exp(12(xμ)TΣ(xμ))

With respect to the Lebesgue measure on the row space of Σ and where

Note

When Σ is invertible,

Gaussian Random Vector

If a random vector xN(μ,Σ), we call x a Gaussian random vector. If xN(0,Id), then we call x a standard Gaussian random vector.

see gaussian random vector

Proposition

Let xN(μ,Σ) be a gaussian random vector. Then μ is the mean vector and Σ is the covariance matrix of x and

μ=E[x]Σ=E[(xμ)(xμ)T]=E[xxT]μμT

ie, the law of a gaussian random vector is determined by its mean and covariance (or its linear and quadratic moments)

Theorem

Let xRd(μ,Σ) be a gaussian random vector, ARd×d, and bRd. Then Ax+b is also a gaussian random vector with

Law(Ax+b)=N(Aμ+b,AΣAT)

ie, gaussian random vectors are closed under linear transformations.

see gaussian random vectors are closed under linear transformations

The two above facts means that a standard gaussian random vector is both an iid random vector and an orthogonally invariant random vector!

Theorem

Suppose xN(0,Id) is a standard gaussian random vector. For any aSd1, we have

Law(a,x)=N(0,1)

In particular, this does not depend on a and x is orthogonally invariant

Proof

For QO(d), we have (from gaussian random vectors are closed under linear transformations) that

Law(Qx)=N(0,QTIdQ)=N(0,Id)=Law(x)

Now, consider a special case where a is the first row of Q (and we can find an orthogonal resulting Q via Gram-Schmidt). Then we see that the law is independent of a as desired.

see standard gaussian random vectors are orthogonally invariant

Theorem

If xN(0,Id), then Law(x||x||)=Unif(Sd1)

Proof

The result follows immediately from standard gaussian random vectors are orthogonally invariant and the independence proposition for orthogonally invariant distribution on the unit sphere.

see normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere

1.3 Concentration of Gaussian Vector Norms

Important

The direction of a gaussian random vector is uniformly distributed.

If xN(0,Id) is a standard gaussian random vector, then

\mathbb{E}[\lvert \lvert x \rvert \rvert { #2} ] = d \cdot \mathbb{E}[x_{1}^2] = d

Since ||x||2=i=1dxi2 is the sum of d iid random variables, then by the Law of Large Numbers and the central limit theorem, we expect that ||x||2d with high probability.

\lvert \lvert x \rvert \rvert { #2} = d +{\cal O}(\sqrt{ d })

By using Markov's inequality and Chebyshev's inequality, we can get something close to this.

By Markov, we have

\mathbb{P}[\lvert \lvert x\rvert \rvert { #2} \geq d + t] \leq \frac{d}{d+t} = \frac{1}{1+\frac{t}{d}}\implies \lvert \lvert x \rvert \rvert { #2} = {\cal O}(d)\quad \text{ with High Probability when } 0\leq t -d <\varepsilon

And by Chebyshev we get

\begin{align} \mathbb{P}[\lvert \lvert x \rvert \rvert { #2} \geq d + t] &= \mathbb{P}[\lvert \lvert x \rvert \rvert { #2} \geq \mathbb{E}[\,\lvert \lvert x \rvert \rvert { #2} ] + t] \\ & \leq \frac{\text{Var}(\lvert \lvert x \rvert \rvert { #2} )}{t^2} \\ &= \frac{2d}{t^2} \end{align}

Where Var[||x||2]=dVar[x12]=2d. So this gives ||x||2=O(d) with high probability when 0td<ε

This isn't quite what we want since both results above depend a lot on the value of t.

Theorem

Let xN(0,Id) be a standard gaussian random vector and t0. Then

P[|||x||2d|t]=P[|i=1dxi2d|t]2{exp(t28d)if tdexp(t8)if td=2exp(18min{t2d,t})
Note

This proof uses the Chernoff Method, which is used to prove concentration inequalities like the Chebyshev, Hoeffding, Bernstein, Azuma-Hoeffding, and McDiarmid. It is a more general form of a "nonlinear Markov" inequality

Proof

We deal with only one side of the distribution; the desired inequality is achieved by simply multiplying by 2 to account for the other tail.

Note that E[xi2]=1, so define si=xi21. Then

P[i=1dxi2dt]=P[i1dsit]=P[exp(λi=1dsi)exp(λt)]()E[exp(λsi)]exp(λt)()=E[exp(λs1)]dexp(λt)=exp(λt+dlog(E[exp(λs1)]))

Where

Now, define

ψ(λ):=log(E[exp(λs1)])=log(E[exp(λ(x121))])

As the cumulant generating function (wikipedia) of x121 (which has expectation 0). Note that E[exp(λx12)] is finite if and only if λ<12 and in this case

E[exp(λx12)]=112λψ(λ)=λ+12log(112λ)()=λ2+O(λ)

Where we get () via Taylor expansion. This yields the bound ψ(λ)2λ2 for λ14

Assuming λ14, we have

P[i=1dxi2dt]exp(λt+dlog(E[exp(λs1)]))=exp(λt+dψ(λ))exp(λt+2dλ2)

Now, 2dλ2λt is a convex quadratic function of λ, and is thus minimized at λ=t4d. We want to set λ=λ, but we must ensure that λ14 to maintain our desired bound. Thus, we have 2 cases:

  • If td, then λ14 and we can safely set λ=λ.
P[i=1dxi2dt]exp(λt+2dλ2)=exp(t24d)
  • If td, then t4d14, so we instead set λ=14.
P[i=1dxi2dt]exp(λt+2dλ2)=exp(t4+d8)exp(t8)

Which is the desired result

see concentration inequality for magnitude of standard gaussian random vector

Note

The important part of the proof is finding some ψ(λ)=O(λ2) for a bounded λ. This means that the random variance is subexponential (a weaker version of being subgaussian).

This result is characteristic of sums of d iid subexponential random variables. Sums of this type have "Gaussian tails" scaled by exp(ct2d) up to a cutoff of td, after which they have "exponential tails" scaled by exp(ct)

  • Bernstein's inequality is general tool for expressing this type of behavior (see notes pg. 7 for more + reference)

Review

#flashcards/math/rmt

The Law is the {distribution} of the random vector.

What does Sd1(r) denote?
-?-
The surface of the sphere of radius r in d dimensions.

Important

The {asa||direction||thing} of a {sas||gaussian random vector} is {asa||uniformly distributed on Sd1||characteristic of thing}

With high probability, the {1||magnitude} of a gaussian random vector is {1||d}

TODO

Created 2025-09-04 ֍ Last Modified 2025-09-11