[[lecture-data]]

:LiArrowBigLeftDash: = choice(this["previous_lecture"], this["previous_lecture"], "No Previous Lecture") | = choice(this["next_lecture"], this["next_lecture"], "No Next Lecture") :LiArrowBigRightDash:

Class Notes

pg 12 - 15 (2.2 Johnson-Lindenstrauss Lemma)

Summary

Random matrices for dimensionality reduction

Johnson-Lindenstrauss transform

First moment method

Some applications

2. Rectangular Matrices

2.2 Dimensionality Reduction and Johnson-Lindenstrauss

From last time, we saw that for a point cloud , as long as

$d$ is not too small with respect to $m$ and
The $y_{i}$ are fixed before drawing $\hat{G}$ Then $\hat{G}$ approximately preserves the geometry of the $y_{i}$

$ε$ -faithful

Suppose $y_{1}, \dots, y_{n} \in R^{m}$ . A function $f : R^{m} \to R^{d}$ is called $ε$ -faithful on the $y_{i}$ is for all $i, j \in [n]$ we have $(1 - ε) ∣∣ y_{i} - y_{j} ∣ ∣^{2} \leq ∣∣ f (y_{i}) - f (y_{j})∣ ∣^{2} \leq (1 + ε) ∣∣ y_{i} - y_{j} ∣ ∣^{2}$

see epsilon faithful function

Theorem (Johnson-Lindenstrauss)

Let $y_{1}, \dots, y_{n} \in R^{m}$ and fix $ε \in (0, 1)$ . Let $\hat{G} \sim N (0, \frac{1}{d})^{\otimes d \times m}$ . Suppose that $d \geq 24 \frac{l o g n}{ε ^{2}}$ And denote $(*)$ as the event that multiplication by $\hat{G}$ is pairwise $ε$ - faithful for the $y_{i}$ . Then we have
$P [(*)] \geq 1 - \frac{1}{n}$

Proof

Note that being $ε$ -faithful pairwise to the $y_{i}$ requires the preservation of the $(2 n)$ vector norms within a factor of $1 \pm ε$ .

note we can do this for any $(2 n)$ vectors and in particular the $y_{i}$ specified in the statement!

Let $x \in R^{m}$ . Recall that the distribution of $\hat{G} x$ is $N (0, \frac{1}{d} ∣∣ x ∣ ∣^{2} I_{d})$ . Thus we have
$\mathbb{P}_{\hat{G} \sim {\cal N\left( 0, \frac{1}{d} \right)}^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert ^2 \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert ^2\,\right] &= \mathbb{P}_{g \sim {\cal N}\left( 0, \frac{1}{d} \lvert \lvert x \rvert \rvert ^2 \right)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert ^2 - \lvert \lvert x \rvert \rvert ^2\, \right\rvert > \varepsilon \lvert \lvert x \rvert \rvert ^2 \, \right] \\ &= \mathbb{P}_{g \sim {\cal N}(0, I_{d})} \left[\, \left\lvert \frac{\,1}{d}\lvert \lvert x \rvert \rvert ^2 \cdot \lvert \lvert g \rvert \rvert ^2 - \lvert \lvert x \rvert \rvert ^2 \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert ^2\, \right] \\ &= \mathbb{P}_{g \sim {\cal N}(0, I_d)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert ^2 - d\, \right\rvert > \varepsilon d\, \right] \end{align}$$ So we can apply our [[concentration inequality for magnitude of standard gaussian random vector]]! Since $\varepsilon<1$, we have $\varepsilon d<d$ and thus $\min \left\{ \frac{(\varepsilon d)^2}{d}, \varepsilon d \right\}=\varepsilon^2d$. Thus we have $$\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert ^2 \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert ^2\,\right] &\leq 2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ \implies \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,(1-\varepsilon) \lvert \lvert x \rvert \rvert ^2 \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert ^2\,\right] &\geq 1 - 2\exp\left( -\frac{1}{8}\varepsilon^2d \right) \end{align}$$ Now, let $S_{ij}$ be the event that $(1-\varepsilon) \lvert \lvert x \rvert \rvert ^2 \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert ^2$ for $x=y_{i}-y_{j}$. Then by the [[outer measure has countable subadditivity|union bound]], we have $$\begin{align} \mathbb{P}[S_{ij}^C \text{ for some }i,j] & \leq {n \choose 2} \cdot 2 \exp\left( \frac{1}{8} \varepsilon^2 d \right) \\ &\leq n^2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ &=\frac{1}{n} \exp\left( 3\log n-\frac{1}{8}\varepsilon^2\underbrace{ d }_{ d \geq 24 \frac{\log n}{\varepsilon^2}} \right) \\ \implies\quad &\leq \frac{1}{n} \exp\left( 3 \log n - 3\log n \right) \\ &=\frac{1}{n} \end{align}$$ $$\tag*{$\blacksquare$}$$$

Note

In fact, we can write an expression for the bound on $d$ such that if $d \geq C (k) \frac{l o g n}{ε ^{2}}$ then we can get a success probability of $1 - \frac{1}{n ^{k}}$ .

Note

Johnson-Lindenstrauss only deals with the pairwise distances between the $y_{i}$ and there are other aspects of the geometry that it does not address.

Despite this, it is an intuitive result and is still relevant to numerous applications.

see Johnson-Lindenstrauss lemma

2.2.1 Simple Extensions

It is easy to see that $(1 - ε) ∣∣ y_{i} ∣∣ \leq ∣∣ f (y_{i})∣∣ \leq (1 + ε) ∣∣ y_{i} ∣∣$ with the same probability bound by adding $0$ to the collection of $y_{i}$ and increasing $n$ to $n + 1$ .

2.2.2 Lower Bounds

Is the Johnson-Lindenstrauss lemma result optimal? Can we reduce the output dimension $d$ more?

The original paper showed that dimension $Ω (lo g n)$ is required independent of $ε$ .

The following two papers show that this results is tight up to constants:

Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction. arXiv preprint arXiv:1411.2404, 2014.
Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633–638. IEEE, 2017.

2.2.3 Sparse and Fast Johnson-Lindenstrauss Transforms

How can we speed up the multiplications $\hat{G} (y_{i})$ ?

The main approach to this is to find special matrices $\hat{G}$ that allow for faster matrix-vector multiplication.

Simplest results deal with sparse matrices
Also fast Fourier Transform via multiplication with the discrete Fourier transform matrix
- multiply then subsample entries

2.2.4 Proof Technique: First Moment Method

The proof of the Johnson-Lindenstrauss lemma applies the union bound. This turns out to be a very powerful tool if we can carefully choose our events.

If we want to bound the probability of a “bad” event, we can decompose it into $E = E_{1} \cup E_{2} \cup \dots \cup E_{N}$ and bound it as $P [E] \leq \sum_{i = 1}^{N} P [E_{i}] \leq N \cdot max_{i} {P [E_{i}]}$ Often, for a well-chosen set of $E_{i}$ , the $P [E_{i}]$ are about the same. If $N$ is large and $P [E_{i}]$ is very small, we can often see a bound with an exponential scale (like we did in the proof)

This approach is sometimes called the first moment method because we may write

\mathbb{P}[\text{ some }E_{i}\text{ occurs }] &= \mathbb{P}\left[ \sum_{i=1}^N \mathbb{1}_{\{ E_{i} \} }\geq 1\right] \\ &\leq \mathbb{E}\left[ \sum_{i=1}^N \mathbb{1}_{\{ E_{i} \}}\right] \\ &= \sum_{i=1}^N \mathbb{P}[E_{i}] \end{align}$$ which we get from [[Markov's Inequality]]. Here, we compute the expectation or *first moment* of the random variable $\#E = \lvert \{ i: E_{i} \text{ occurs} \} \rvert$. > [!NOTE] > > There is also the "second moment method" which involves computing the second moment of $\#E$. This method is usually used to show that some $E_{i}$ *does* occur with high probability. > > > [!example] > > > > If $d$ is too small, then $\hat{G}$ is *not* pairwise $\varepsilon$-[[epsilon faithful function|faithful]]. > see [[first moment method]] # Review ## TODO - [-] Add flashcards to random matrix lecture 03 ⏳ 2025-09-25 ❌ 2025-11-06 ```datacorejsx const { dateTime } = await cJS() return function View() { const file = dc.useCurrentFile(); return <p class="dv-modified">Created {dateTime.getCreated(file)} ֍ Last Modified {dateTime.getLastMod(file)}</p> } ```

mnzk digital garden

Explorer

Random Matrix Lecture 03

2. Rectangular Matrices

2.2 Dimensionality Reduction and Johnson-Lindenstrauss

2.2.1 Simple Extensions

2.2.2 Lower Bounds

2.2.3 Sparse and Fast Johnson-Lindenstrauss Transforms

2.2.4 Proof Technique: First Moment Method

Graph View

Table of Contents