1.1 Natural Random Models and Orthogonal Invariance
Interpretations of vectors
List of numbers
Magnitude and direction (with respect to a [[basis]] of the [[vector space]] they belong to)
Corresponding interpretations for random vectors:
Entries (each of the numbers) are as independent as possible
ie entries are iid from
Magnitude and direction are as independent as possible
Take and to be independent
The magnitude is then any random non-negative scalar and the direction is
ie a random vector drawn uniformly from the unit sphere
See [[random vector]]
Orthogonal Invariance
Let be a random vector. We say that (or its law) is orthogonally invariant if for each (deterministic) orthogonal matrix we have
see also [[invariant]]
Note
The law is the distribution of the random vector.
We sometimes use "model" instead of "law" or "distribution" to remind us that our choice includes assumptions and judgements.
We are modelling a situation we might encounter in applications
Proposition
There exists a unique probability measure supported on that is orthogonally [[invariant]]. We denote this
see [[orthogonally invariant distribution on the unit sphere]]
For , we have (from [[gaussian random vectors are closed under linear transformations]]) that
Now, consider a special case where is the first row of (and we can find an orthogonal resulting via [[Gram-Schmidt]]). Then we see that the law is independent of as desired.
see [[standard gaussian random vectors are orthogonally invariant]]
Theorem
If , then
Proof
The result follows immediately from [[standard gaussian random vectors are orthogonally invariant]] and the independence proposition for [[orthogonally invariant distribution on the unit sphere]].
see [[normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere]]
1.3 Concentration of Gaussian Vector Norms
Important
The direction of a [[gaussian random vector]] is uniformly distributed.
[?] So we've addressed the direction. What about the magnitudes?
If is a standard [[gaussian random vector]], then
Since is the sum of iid random variables, then by the [[Law of Large Numbers]] and the central limit theorem, we expect that with high probability.
By using [[Markov's inequality]] and [[Chebyshev's inequality]], we can get something close to this.
We deal with only one side of the distribution; the desired inequality is achieved by simply multiplying by 2 to account for the other tail.
Note that , so define . Then
Where
is from [[Markov's Inequality]]
is because the are independent
Now, define
As the cumulant generating function (wikipedia) of (which has expectation 0). Note that is finite if and only if and in this case
Where we get via Taylor expansion. This yields the bound for
Exercise (tedious) - Verify the bound
Can plot and the bound (see notes page 6; figure 1.1)
Verify via algebra that for all
Assuming , we have
Now, is a convex quadratic function of , and is thus minimized at . We want to set , but we must ensure that to maintain our desired bound. Thus, we have 2 cases:
If , then and we can safely set .
If , then , so we instead set .
Which is the desired result
see [[concentration inequality for magnitude of standard gaussian random vector]]
Note
The important part of the proof is finding some for a bounded . This means that the random variance is subexponential (a weaker version of being subgaussian).
This result is characteristic of sums of iid subexponential random variables. Sums of this type have "Gaussian tails" scaled by up to a cutoff of , after which they have "exponential tails" scaled by
Bernstein's inequality is general tool for expressing this type of behavior (see notes pg. 7 for more + reference)