Johnson-Lindenstrauss lemma

[[concept]]

[!themes] Topics

Evaluation Error: SyntaxError: Unexpected token '>'

at DataviewInlineApi.eval (plugin:dataview:19027:21)

at evalInContext (plugin:dataview:19028:7)

at asyncEvalInContext (plugin:dataview:19038:32)

at DataviewJSRenderer.render (plugin:dataview:19064:19)

at DataviewJSRenderer.onload (plugin:dataview:18606:14)

at DataviewJSRenderer.load (app://obsidian.md/app.js:1:1182416)

at DataviewApi.executeJs (plugin:dataview:19607:18)

at DataviewCompiler.eval (plugin:digitalgarden:10763:23)

at Generator.next (<anonymous>)

at fulfilled (plugin:digitalgarden:77:24)

Theorem

Theorem (Johnson-Lindenstrauss)

Let $y_{1}, \dots, y_{n} \in R^{m}$ and fix $ε \in (0, 1)$ . Let $\hat{G} \sim N {(0, \frac{1}{d})}^{\otimes d \times m}$ . Suppose that

d \geq 24 \frac{\log n}{ε^{2}}

And denote $(*)$ as the event that multiplication by $\hat{G}$ is pairwise $ε$ - faithful for the $y_{i}$ . Then we have

P [(*)] \geq 1 - \frac{1}{n}

Proof

Note that being $ε$ -faithful pairwise to the $y_{i}$ requires the preservation of the $(\binom{n}{2})$ vector norms within a factor of $1 \pm ε$ .

note we can do this for any $(\binom{n}{2})$ vectors and in particular the $y_{i}$ specified in the statement!

Let $x \in R^{m}$ . Recall that the distribution of $\hat{G} x$ is $N (0, \frac{1}{d} | | x | |^{2} I_{d})$ . Thus we have

\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N\left( 0, \frac{1}{d} \right)}^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert {#2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert {#2} \,\right] &= \mathbb{P}_{g \sim {\cal N}\left( 0, \frac{1}{d} \lvert \lvert x \rvert \rvert {#2} \right)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert {#2}

{ #2}
, \right\rvert > \varepsilon \lvert \lvert x \rvert \rvert
{ #2}
, \right] \

&= \mathbb{P}{g \sim {\cal N}(0, I)} \left[, \left\lvert \frac{,1}{d}\lvert \lvert x \rvert \rvert
{ #2}
\cdot \lvert \lvert g \rvert \rvert

\lvert \lvert x \rvert \rvert
{ #2}
, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert
{ #2}
, \right] \

&= \mathbb{P}_{g \sim {\cal N}(0, I_d)} \left[ ,\left\lvert ,\lvert \lvert g \rvert \rvert

d, \right\rvert > \varepsilon d, \right]

\end{align}$$
So we can apply our concentration inequality for magnitude of standard gaussian random vector! Since $ε < 1$ , we have $ε d < d$ and thus $min {\frac{(ε d)^{2}}{d}, ε d} = ε^{2} d$ . Thus we have
$\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert {#2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert {#2} \,\right] &\leq 2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ \implies \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,(1-\varepsilon) \lvert \lvert x \rvert \rvert {#2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert {#2} \,\right] &\geq 1 - 2\exp\left( -\frac{1}{8}\varepsilon^2d \right) \end{align}$
Now, let $S_{i j}$ be the event that $(1-\varepsilon) \lvert \lvert x \rvert \rvert {#2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert {#2}$ for $x = y_{i} - y_{j}$ . Then by the union bound, we have
$\begin{aligned} P [S_{i j}^{C} for some i, j] & \leq (\binom{n}{2}) \cdot 2 \exp (\frac{1}{8} ε^{2} d) \\ \leq n^{2} \exp (- \frac{1}{8} ε^{2} d) \\ = \frac{1}{n} \exp (3 \log n - \frac{1}{8} ε^{2} \underset{d \geq 24 \frac{\log n}{ε^{2}}}{\underset{⏟}{d}}) \\ ⟹ & \leq \frac{1}{n} \exp (3 \log n - 3 \log n) \\ = \frac{1}{n} \end{aligned}$ $\begin{matrix} ◼ \end{matrix}$

Note

Johnson-Lindenstrauss only deals with the pairwise distances between the $y_{i}$ and there are other aspects of the geometry that it does not address.

Despite this, it is an intuitive result and is still relevant to numerous applications.

Note

The proof of this lemma uses the first moment method

Extensions

It is easy to see that

(1 - ε) | | y_{i} | | \leq | | f (y_{i}) | | \leq (1 + ε) | | y_{i} | |

with the same probability bound by adding $\vec{0}$ to the collection of $y_{i}$ and increasing $n$ to $n + 1$ .

We can also write an expression for the bound on $d$ such that if $d \geq C (k) \frac{\log n}{ε^{2}}$ then we can get a success probability of $1 - \frac{1}{n^{k}}$ .

Johnson-Lindenstrauss Transform

Multiplying data by a random matrix that satisfies the criteria for the Johnson-Lindenstrauss lemma is sometimes called the Johnson-Lindnestrauss transform (JLT).

This is usually done for the purpose of dimensionality reduction.

There is also work being done to see how to speed up the multiplications $\hat{G} y$ . The main approach to this is to find special matrices $\hat{G}$ that allow for faster matrix-vector multiplication.

Simplest results deal with sparse matrices
Also fast Fourier Transform via multiplication with the discrete Fourier transform matrix
- multiply then subsample entries

Optimality

The result is not optimal. The original paper showed that dimension $Ω (\log n)$ is required independent of $ε$ .

The following two papers show that the general result is tight up to constants:

Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction. arXiv preprint arXiv:1411.2404, 2014.
Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633–638. IEEE, 2017.

File	Last Modified
first moment method	2025-09-15
Manifold Learning Lecture 05	2025-09-26
Random Matrix Lecture 03	2025-09-15
Random Matrix Lecture 07	2025-09-23