PCA And SVD
What can PCA do?
- Compress: If you want to compress a photo, PCA is a method. It can compress the photo greatly without lossing to much of the definition
- Dimension Reduction: Well known in Machine Learning, to lower down the noise, and reduce the features that are unnecessary to get a faster and more accurate result.
- Data Visualization: It can take 4 or more variables and make a 2-D PCA plot
- Choose Main Variables: It can tell us which variable is the most valuable for clustring the data.
How to use it?
Code:
# PCA
pca = PCA(n_components=310)
X_scaled=pca.fit_transform(X)
# SVD
U,S,V = np.linalg.svd(X.T @ X)
Xrot = np.dot(X_scaled, U[:,:100]) # decorrelate the data```
### SVD
#### How to use SVD to reconstruct a matrix?
row matrix A = U * Sigma * V^T
```python
import numpy as np
A = np.array([[1,2],[3,4],[5,6]])
U, s, V = np.linalg.svd(A)
Sigma = np.zeros((A.shape[0],A.shape[1]))
shape = A.shape[0] if A.shape[0] < A.shape[1] else A.shape[1]
Sigma[:shape,:shape] = np.diag(s)
B = U.dot(Sigma.dot(V)) # or U @ Sigma @ V
print(B)```
#### pseudoinverse(Generalized Inverse)
A^+ means pseudoinverse
A^+ = V * D^+ * U^T
A^+ is pseudoinverse of A, D^+ is pseudoinverse of Sigma, U^T is transponse of U.
We can use U * Sigma * V^T to compute U, Sigma and V to get pseudoinverse.
```python
import numpy as np
A = np.array([[1,2],[3,4],[5,6]])
U, s, V = np.linalg.svd(A) # ( V is V.T)
Sigma = np.zeros((A.shape[0],A.shape[1]))
shape = A.shape[0] if A.shape[0] < A.shape[1] else A.shape[1]
Sigma[:shape,:shape] = np.diag(s)
D = Sigma
D[:shape,:shape] = np.diag(1/s)
Dpse = D.T
Apse = V @ Dpse @ U.T
print(Apse)
#or
Apse = np.linalg.svd(A)
print(Apse)
What it can do?
SVD - decomposition
eg: feature numbers(columns) > data numbers(rows)
Then we can choose k biggest Sigma values, and choose the row from V.T, and then reconstruct the matrix: B = U * SigmaNew * VNew
A new dataset(the projection of A): T = U * SigmaNew or T = A * VNew.T
import numpy as np
A = np.array([list(range(1,11)),list(range(11,21)),list(range(21,31))])
U, s, V = np.linalg.svd(A)
Sigma = np.zeros((A.shape[0],A.shape[1]))
shape = A.shape[0] if A.shape[0] < A.shape[1] else A.shape[1]
Sigma[:shape,:shape] = np.diag(s)
# select
n_elements = 2
SigmaNew = Sigma[:, :n_elements]
VNew = V[:n_elements, :]
# reconstruct
B = U @ SigmaNew @ VNew
T = A @ VNew.T # or
T = U @ SigmaNew```
```python
import numpy as np
from sklearn.decomposition import TruncatedSVD
A = np.array([list(range(1,11)),list(range(11,21)),list(range(21,31))])
SVD = TruncatedSVD(n_componets = 2)
SVD.fit(A)
result = SVD.transform(A)
print(result)
Reference:
- Principal Component Analysis 4 Dummies: Eigenvectors, Eigenvalues and Dimension Reduction
- We Recommend a Singular Value Decomposition
Welcome to share or comment on this post: