Say each of the elements in D are called σii2≥0 (we can do this because each eigenvector of AA∗ is real).
Say A is rank k. Then let σk>0,σk+1=0 and define Σ to be the m×m matrix containing the ordered σis.
For i=1,…,k, define the ith row of W to be σi1ui∗A. These are orthonormal since for all i,j we have 1(σiui∗A)σjuj∗A1=ui∗AA∗uj=σiσj1Dij and this is 1 if i=j and 0 otherwise.
For i=k+1,…,m, define the rows of W any way you like, so long as they are orthonormal and orthogonal to the first k rows of W. And thus we are done! We have
U∗A=ΣW⟹A=UΣW
For rows 1,…,k equality holds by construction. For rows k+1,…,m, we have AA∗ui=0 since each ui is an eigenvector with eigenvalues 0 for AA∗. Thus we have ui∗AA∗ui=0=∣∣ui∗A∣∣22⟹ui∗A=0= the ith row of W. So LHS = RHS = 0.
Note that if A is real, we can take U,Σ,W real.
Theorem (Matrix Analysis 2)
Theorem (Singular Value Decomposition)
For any A∈Mm,n, there exists U∈Mm, V∈Mn both unitary and Σ∈Mm,n “diagonal” with min{m,n} real, nonnegative, nonincreasing, entries such that
A=UΣV∗
Further, if A is real-valued, we can take each of U,Σ,V to be real.
(we take “diagonal” to mean that the only entries that can be nonzero have the same row and column index)
Note
We can think of this as a generalization of diagonalization/spectral decomposition, but where the loss is that U and V are distinct.
if A is positive semidefinite, then the diagonalization is the singular value decomposition since we have that A=UΣU∗ for some unitary U and nonnegative Σ all real.
m≤n, then by the previous definition here we have A=UΣW where U∈Mnunitary, Σ∈Mm diagonal, and W∈Mm,n with orthogonal columns. Since m≤n, we add n−m columns of zeros to Σ to make it size m×n. Then we can append n−m orthogonal rows to W with Gram-Schmidt to get a V∗
If
If n>m, then A∗=UΣV∗ is SVD by the above case. But then A=VΣTU∗ is an SVD of A!
Another Definition (Data Science)
Singular Value Decomposition
The Singular Value Decomposition of a matrix X∈Cn×m is the unique factorization such that
X=UΣV∗
Where U∈Cn×n,V∈Cm×m are unitary and orthonormal. A unitary matrix U is one such that U∗U=UU∗=I. The columns of U are called the left singular vectors of X, and the columns of V are called the right singular vectors. The entries of Σ are called the singular values
the columns of U are the same “shape” as the columns of X. They give me a basis where I can represent the columns of my data matrix X
This means that the columns of U can be reshaped into “eigen”representations of instances of the data
The columns of V∗ are the “mixtures” of the columns of U that we need in order to reproduce each of the instances of the dataset
Notes on SVD
Say A∈Mm,n and A∈UΣV∗ its SVD. Suppose there are k non-negative singular values call them σ1,…,σk. Denote u1,…um the columns of U and v1∗,…,vn∗ the rows of V∗. Then
A=UΣV∗=∑i=1kσiuivi∗
rank(A)=rank(Σ)=k
range(A)=span{u1,…,uk} - easy to see from (1)
null(A)=span(vk+1,…,vn) - easy to see from (1)
range(A∗)=span{v1,…,vk}
null(A∗)=span{uk+1,…,um}
For any A∈Mm,n, we have ∣∣A∣∣2,2:=maxx∈Cn=0∣∣x∣∣2∣∣Ax∣∣2=σ1
∣∣A∣∣F2=∣∣UΣV∗∣∣F2=∣∣Σ∣∣F2=∑i=1kσi
To see (5) and (6) , recall that A∗=VΣTU∗ and we simply apply (3) and (4) in this case.
To see (7), note that
∣∣A∣∣2,22=maxx=0∣∣x∣∣2∣∣Ax∣∣2=maxx∗xx∗A∗Ax=∗,∗∗σ12
Note ∗ is by Rayleigh-Ritz that this equals the largest eigenvalue of A∗A.
Then ∗∗ is by the fact that the eigenvalues of A∗A=(UΣV∗)∗(UΣV∗)=VΣΣ∗V∗ - ie the eigenvalues are the squared singular values of A.