Random Matrix Lecture 01

[[lecture-data]]

:LiArrowBigLeftDash: No Previous Lecture | Random Matrix Lecture 02 :LiArrowBigRightDash:

Quote

Class notes pgs 2-7 (1.1 - 1.3)

Summary

  • Random vector theory
  • Properties of Gaussian random vectors
  • Concentration inequalities

1. Random Vector Theory

1.1 Natural Random Models and Orthogonal Invariance

Interpretations of vectors

  1. List of numbers
  2. Magnitude and direction (with respect to a [[basis]] of the [[vector space]] they belong to)

Corresponding interpretations for random vectors:

  1. Entries (each of the numbers) are as independent as possible

    • xμd ie entries are iid from μ
  2. Magnitude and direction are as independent as possible

    • Take ||x|| and x||x|| to be independent
    • The magnitude is then any random non-negative scalar and the direction is
      • xUniform(Sd1) ie a random vector drawn uniformly from the unit sphere
Sd1:={yRd:||y||=1}

See [[random vector]]

Orthogonal Invariance

Let xRd be a random vector. We say that x (or its law Law(x)) is orthogonally invariant if for each (deterministic) orthogonal matrix QO(d) we have

Law(Qx)=Law(x)

see also [[invariant]]

Note

The law is the distribution of the random vector.
We sometimes use "model" instead of "law" or "distribution" to remind us that our choice includes assumptions and judgements.

  • We are modelling a situation we might encounter in applications
Proposition

There exists a unique probability measure supported on Sd1 that is orthogonally [[invariant]]. We denote this Uniform(Sd1)

see [[orthogonally invariant distribution on the unit sphere]]

Proposition

Suppose that x is orthogonally [[invariant]]. Then

  • ||x|| is independent of x||x||
  • Law(x||x||)=Uniform(Sd1).
Question

  • What do the entries look like for a vector xUniform(Sd1) ?
  • What are the magnitude and direction of an iid [[random vector]] ?

1.2 Gaussian Random Vectors

The most natural of the iid random vectors is the multivariate Gaussian

xN(0,1)d=N(0,Id)
Multivariate Normal

The multivariate normal or Gaussian N(μ,Σ) for parameters μRd and (positive semidefinite) ΣRd×d is the [[probability measure]] with density

1det(2πΣ)exp(12(xμ)TΣ(xμ))

With respect to the [[Lebesgue measure]] on the row space of Σ and where

  • Σ is the [[Moore-Penrose inverse]]
  • det is the product of all non-zero eigenvalues
Note

When Σ is invertible,

  • det is the ordinary [[determinant]]
  • Σ=Σ1
  • the [[Lebesgue measure]] is on all of Rd
Gaussian Random Vector

If a [[random vector]] xN(μ,Σ), we call x a Gaussian random vector. If xN(0,Id), then we call x a standard Gaussian random vector.

see [[gaussian random vector]]

Proposition

Let xN(μ,Σ) be a [[gaussian random vector]]. Then μ is the mean vector and Σ is the covariance matrix of x and

μ=E[x]Σ=E[(xμ)(xμ)T]=E[xxT]μμT

ie, the law of a gaussian random vector is determined by its mean and covariance (or its linear and quadratic moments)

Theorem

Let xRd(μ,Σ) be a [[gaussian random vector]], ARd×d, and bRd. Then Ax+b is also a gaussian random vector with

Law(Ax+b)=N(Aμ+b,AΣAT)

ie, gaussian random vectors are closed under linear transformations.

see [[gaussian random vectors are closed under linear transformations]]

The two above facts means that a standard orthogonally invariant random vector!

Theorem

Suppose xN(0,Id) is a standard [[gaussian random vector]]. For any aSd1, we have

Law(a,x)=N(0,1)

In particular, this does not depend on a and x is orthogonally invariant

Proof

For QO(d), we have (from [[gaussian random vectors are closed under linear transformations]]) that

Law(Qx)=N(0,QTIdQ)=N(0,Id)=Law(x)

Now, consider a special case where a is the first row of Q (and we can find an orthogonal resulting Q via [[Gram-Schmidt]]). Then we see that the law is independent of a as desired.

see [[standard gaussian random vectors are orthogonally invariant]]

Theorem

If xN(0,Id), then Law(x||x||)=Unif(Sd1)

Proof

The result follows immediately from [[standard gaussian random vectors are orthogonally invariant]] and the independence proposition for [[orthogonally invariant distribution on the unit sphere]].

see [[normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere]]

1.3 Concentration of Gaussian Vector Norms

Important

The direction of a [[gaussian random vector]] is uniformly distributed.

If xN(0,Id) is a standard [[gaussian random vector]], then

\mathbb{E}[\lvert \lvert x \rvert \rvert { #2} ] = d \cdot \mathbb{E}[x_{1}^2] = d

Since ||x||2=i=1dxi2 is the sum of d iid random variables, then by the [[Law of Large Numbers]] and the central limit theorem, we expect that ||x||2d with high probability.

\lvert \lvert x \rvert \rvert { #2} = d +{\cal O}(\sqrt{ d })

By using [[Markov's inequality]] and [[Chebyshev's inequality]], we can get something close to this.

By Markov, we have

\mathbb{P}[\lvert \lvert x\rvert \rvert { #2} \geq d + t] \leq \frac{d}{d+t} = \frac{1}{1+\frac{t}{d}}\implies \lvert \lvert x \rvert \rvert { #2} = {\cal O}(d)\quad \text{ with High Probability when } 0\leq t -d <\varepsilon

And by Chebyshev we get

\begin{align} \mathbb{P}[\lvert \lvert x \rvert \rvert { #2} \geq d + t] &= \mathbb{P}[\lvert \lvert x \rvert \rvert { #2} \geq \mathbb{E}[\,\lvert \lvert x \rvert \rvert { #2} ] + t] \\ & \leq \frac{\text{Var}(\lvert \lvert x \rvert \rvert { #2} )}{t^2} \\ &= \frac{2d}{t^2} \end{align}

Where Var[||x||2]=dVar[x12]=2d. So this gives ||x||2=O(d) with high probability when 0td<ε

This isn't quite what we want since both results above depend a lot on the value of t.

Theorem

Let xN(0,Id) be a standard [[gaussian random vector]] and t0. Then

P[|||x||2d|t]=P[|i=1dxi2d|t]2{exp(t28d)if tdexp(t8)if td=2exp(18min{t2d,t})
Note

This proof uses the Chernoff Method, which is used to prove concentration inequalities like the Chebyshev, Hoeffding, Bernstein, Azuma-Hoeffding, and McDiarmid. It is a more general form of a "nonlinear Markov" inequality

Proof

We deal with only one side of the distribution; the desired inequality is achieved by simply multiplying by 2 to account for the other tail.

Note that E[xi2]=1, so define si=xi21. Then

P[i=1dxi2dt]=P[i1dsit]=P[exp(λi=1dsi)exp(λt)]()E[exp(λsi)]exp(λt)()=E[exp(λs1)]dexp(λt)=exp(λt+dlog(E[exp(λs1)]))

Where

  • () is from [[Markov's Inequality]]
  • () is because the xi are independent

Now, define

ψ(λ):=log(E[exp(λs1)])=log(E[exp(λ(x121))])

As the cumulant generating function (wikipedia) of x121 (which has expectation 0). Note that E[exp(λx12)] is finite if and only if λ<12 and in this case

E[exp(λx12)]=112λψ(λ)=λ+12log(112λ)()=λ2+O(λ)

Where we get () via Taylor expansion. This yields the bound ψ(λ)2λ2 for λ14

Assuming λ14, we have

P[i=1dxi2dt]exp(λt+dlog(E[exp(λs1)]))=exp(λt+dψ(λ))exp(λt+2dλ2)

Now, 2dλ2λt is a convex quadratic function of λ, and is thus minimized at λ=t4d. We want to set λ=λ, but we must ensure that λ14 to maintain our desired bound. Thus, we have 2 cases:

  • If td, then λ14 and we can safely set λ=λ.
P[i=1dxi2dt]exp(λt+2dλ2)=exp(t24d)
  • If td, then t4d14, so we instead set λ=14.
P[i=1dxi2dt]exp(λt+2dλ2)=exp(t4+d8)exp(t8)

Which is the desired result

see [[concentration inequality for magnitude of standard gaussian random vector]]

Note

The important part of the proof is finding some ψ(λ)=O(λ2) for a bounded λ. This means that the random variance is subexponential (a weaker version of being subgaussian).

This result is characteristic of sums of d iid subexponential random variables. Sums of this type have "Gaussian tails" scaled by exp(ct2d) up to a cutoff of td, after which they have "exponential tails" scaled by exp(ct)

  • Bernstein's inequality is general tool for expressing this type of behavior (see notes pg. 7 for more + reference)

Review

#flashcards/math/rmt

The Law is the {distribution} of the random vector.

What does Sd1(r) denote?
-?-
The surface of the sphere of radius r in d dimensions.

Important

The {1||direction||thing} of a {2||[[gaussian random vector]]} is {1||uniformly distributed on Sd1||law of thing}

With high probability, the {1||magnitude} of a gaussian random vector is {1||d}

TODO

const { dateTime } = await cJS()

return function View() {
	const file = dc.useCurrentFile();
	return <p class="dv-modified">Created {dateTime.getCreated(file)}     ֍     Last Modified {dateTime.getLastMod(file)}</p>
}