Johnson-Lindenstrauss lemma

[[concept]]

[!themes] Topics

Evaluation Error: SyntaxError: Unexpected token '>'

at DataviewInlineApi.eval (plugin:dataview:19027:21)
at evalInContext (plugin:dataview:19028:7)
at asyncEvalInContext (plugin:dataview:19038:32)
at DataviewJSRenderer.render (plugin:dataview:19064:19)
at DataviewJSRenderer.onload (plugin:dataview:18606:14)
at DataviewJSRenderer.load (app://obsidian.md/app.js:1:1182416)
at DataviewApi.executeJs (plugin:dataview:19607:18)
at DataviewCompiler.eval (plugin:digitalgarden:10763:23)
at Generator.next (<anonymous>)
at fulfilled (plugin:digitalgarden:77:24)

Theorem

Theorem (Johnson-Lindenstrauss)

Let y1,,ynRm and fix ε(0,1). Let G^N(0,1d)d×m. Suppose that

d24lognε2

And denote () as the event that multiplication by G^ is pairwise ε- faithful for the yi. Then we have

P[()]11n
Proof

Note that being ε-faithful pairwise to the yi requires the preservation of the (n2) vector norms within a factor of 1±ε.

note we can do this for any (n2) vectors and in particular the yi specified in the statement!

Let xRm. Recall that the distribution of G^x is N(0,1d||x||2Id). Thus we have

\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N\left( 0, \frac{1}{d} \right)}^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert { #2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert { #2} \,\right] &= \mathbb{P}_{g \sim {\cal N}\left( 0, \frac{1}{d} \lvert \lvert x \rvert \rvert { #2} \right)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert { #2}

{ #2}
, \right\rvert > \varepsilon \lvert \lvert x \rvert \rvert
{ #2}
, \right] \

&= \mathbb{P}{g \sim {\cal N}(0, I)} \left[, \left\lvert \frac{,1}{d}\lvert \lvert x \rvert \rvert
{ #2}
\cdot \lvert \lvert g \rvert \rvert

&= \mathbb{P}_{g \sim {\cal N}(0, I_d)} \left[ ,\left\lvert ,\lvert \lvert g \rvert \rvert

\end{align}$$
So we can apply our concentration inequality for magnitude of standard gaussian random vector! Since ε<1, we have εd<d and thus min{(εd)2d,εd}=ε2d. Thus we have

\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert { #2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert { #2} \,\right] &\leq 2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ \implies \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,(1-\varepsilon) \lvert \lvert x \rvert \rvert { #2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert { #2} \,\right] &\geq 1 - 2\exp\left( -\frac{1}{8}\varepsilon^2d \right) \end{align}

Now, let Sij be the event that (1-\varepsilon) \lvert \lvert x \rvert \rvert { #2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert { #2} for x=yiyj. Then by the union bound, we have

P[SijC for some i,j](n2)2exp(18ε2d)n2exp(18ε2d)=1nexp(3logn18ε2dd24lognε2)1nexp(3logn3logn)=1n
Note

Johnson-Lindenstrauss only deals with the pairwise distances between the yi and there are other aspects of the geometry that it does not address.

Despite this, it is an intuitive result and is still relevant to numerous applications.

Note

The proof of this lemma uses the first moment method

Extensions

It is easy to see that

(1ε)||yi||||f(yi)||(1+ε)||yi||

with the same probability bound by adding 0 to the collection of yi and increasing n to n+1.

We can also write an expression for the bound on d such that if dC(k)lognε2 then we can get a success probability of 11nk.

Johnson-Lindenstrauss Transform

Johnson-Lindenstrauss Transform

Multiplying data by a random matrix that satisfies the criteria for the Johnson-Lindenstrauss lemma is sometimes called the Johnson-Lindnestrauss transform (JLT).

This is usually done for the purpose of dimensionality reduction.

There is also work being done to see how to speed up the multiplications G^y. The main approach to this is to find special matrices G^ that allow for faster matrix-vector multiplication.

Optimality

The result is not optimal. The original paper showed that dimension Ω(logn) is required independent of ε.

The following two papers show that the general result is tight up to constants:

References

References

See Also

Mentions

Mentions

File Last Modified
first moment method 2025-09-15
Manifold Learning Lecture 05 2025-09-26
Random Matrix Lecture 03 2025-09-15
Random Matrix Lecture 07 2025-09-23

const { dateTime } = await cJS()

return function View() {
	const file = dc.useCurrentFile();
	return <p class="dv-modified">Created {dateTime.getCreated(file)}     ֍     Last Modified {dateTime.getLastMod(file)}</p>
}