Random Matrix Lecture 03

[[lecture-data]]

:LiArrowBigLeftDash: = choice(this["previous_lecture"], this["previous_lecture"], "No Previous Lecture") | = choice(this["next_lecture"], this["next_lecture"], "No Next Lecture") :LiArrowBigRightDash:

 

Class Notes

pg 12 - 15 (2.2 Johnson-Lindenstrauss Lemma)

Summary

  • Random matrices for dimensionality reduction
    • Johnson-Lindenstrauss transform
  • First moment method
  • Some applications

2. Rectangular Matrices

2.2 Dimensionality Reduction and Johnson-Lindenstrauss

From last time, we saw that for a point cloud , as long as

  1. is not too small with respect to and
  2. The are fixed before drawing Then approximately preserves the geometry of the

-faithful

Suppose . A function is called -faithful on the is for all we have

see epsilon faithful function

Theorem (Johnson-Lindenstrauss)

Let and fix . Let . Suppose that And denote as the event that multiplication by is pairwise - faithful for the . Then we have

Proof

Note that being -faithful pairwise to the requires the preservation of the vector norms within a factor of .

note we can do this for any vectors and in particular the specified in the statement!

Let . Recall that the distribution of is . Thus we have

\mathbb{P}_{\hat{G} \sim {\cal N\left( 0, \frac{1}{d} \right)}^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert ^2 \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert ^2\,\right] &= \mathbb{P}_{g \sim {\cal N}\left( 0, \frac{1}{d} \lvert \lvert x \rvert \rvert ^2 \right)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert ^2 - \lvert \lvert x \rvert \rvert ^2\, \right\rvert > \varepsilon \lvert \lvert x \rvert \rvert ^2 \, \right] \\ &= \mathbb{P}_{g \sim {\cal N}(0, I_{d})} \left[\, \left\lvert \frac{\,1}{d}\lvert \lvert x \rvert \rvert ^2 \cdot \lvert \lvert g \rvert \rvert ^2 - \lvert \lvert x \rvert \rvert ^2 \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert ^2\, \right] \\ &= \mathbb{P}_{g \sim {\cal N}(0, I_d)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert ^2 - d\, \right\rvert > \varepsilon d\, \right] \end{align}$$ So we can apply our [[concentration inequality for magnitude of standard gaussian random vector]]! Since $\varepsilon<1$, we have $\varepsilon d<d$ and thus $\min \left\{ \frac{(\varepsilon d)^2}{d}, \varepsilon d \right\}=\varepsilon^2d$. Thus we have $$\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert ^2 \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert ^2\,\right] &\leq 2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ \implies \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,(1-\varepsilon) \lvert \lvert x \rvert \rvert ^2 \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert ^2\,\right] &\geq 1 - 2\exp\left( -\frac{1}{8}\varepsilon^2d \right) \end{align}$$ Now, let $S_{ij}$ be the event that $(1-\varepsilon) \lvert \lvert x \rvert \rvert ^2 \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert ^2$ for $x=y_{i}-y_{j}$. Then by the [[outer measure has countable subadditivity|union bound]], we have $$\begin{align} \mathbb{P}[S_{ij}^C \text{ for some }i,j] & \leq {n \choose 2} \cdot 2 \exp\left( \frac{1}{8} \varepsilon^2 d \right) \\ &\leq n^2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ &=\frac{1}{n} \exp\left( 3\log n-\frac{1}{8}\varepsilon^2\underbrace{ d }_{ d \geq 24 \frac{\log n}{\varepsilon^2}} \right) \\ \implies\quad &\leq \frac{1}{n} \exp\left( 3 \log n - 3\log n \right) \\ &=\frac{1}{n} \end{align}$$ $$\tag*{$\blacksquare$}$$

Note

In fact, we can write an expression for the bound on such that if then we can get a success probability of .

Note

Johnson-Lindenstrauss only deals with the pairwise distances between the and there are other aspects of the geometry that it does not address.

Despite this, it is an intuitive result and is still relevant to numerous applications.

see Johnson-Lindenstrauss lemma

2.2.1 Simple Extensions

It is easy to see that with the same probability bound by adding to the collection of and increasing to .

2.2.2 Lower Bounds

Is the Johnson-Lindenstrauss lemma result optimal? Can we reduce the output dimension more?

  • The original paper showed that dimension is required independent of .

The following two papers show that this results is tight up to constants:

  • Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction. arXiv preprint arXiv:1411.2404, 2014.
  • Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633–638. IEEE, 2017.

2.2.3 Sparse and Fast Johnson-Lindenstrauss Transforms

How can we speed up the multiplications ?

The main approach to this is to find special matrices that allow for faster matrix-vector multiplication.

  • Simplest results deal with sparse matrices
  • Also fast Fourier Transform via multiplication with the discrete Fourier transform matrix
    • multiply then subsample entries

2.2.4 Proof Technique: First Moment Method

The proof of the Johnson-Lindenstrauss lemma applies the union bound. This turns out to be a very powerful tool if we can carefully choose our events.

If we want to bound the probability of a “bad” event, we can decompose it into and bound it as Often, for a well-chosen set of , the are about the same. If is large and is very small, we can often see a bound with an exponential scale (like we did in the proof)

This approach is sometimes called the first moment method because we may write

\mathbb{P}[\text{ some }E_{i}\text{ occurs }] &= \mathbb{P}\left[ \sum_{i=1}^N \mathbb{1}_{\{ E_{i} \} }\geq 1\right] \\ &\leq \mathbb{E}\left[ \sum_{i=1}^N \mathbb{1}_{\{ E_{i} \}}\right] \\ &= \sum_{i=1}^N \mathbb{P}[E_{i}] \end{align}$$ which we get from [[Markov's Inequality]]. Here, we compute the expectation or *first moment* of the random variable $\#E = \lvert \{ i: E_{i} \text{ occurs} \} \rvert$. > [!NOTE] > > There is also the "second moment method" which involves computing the second moment of $\#E$. This method is usually used to show that some $E_{i}$ *does* occur with high probability. > > > [!example] > > > > If $d$ is too small, then $\hat{G}$ is *not* pairwise $\varepsilon$-[[epsilon faithful function|faithful]]. > see [[first moment method]] # Review ## TODO - [-] Add flashcards to random matrix lecture 03 ⏳ 2025-09-25 ❌ 2025-11-06 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```