2019 August 25 Machine Learning

PCA And SVD

What can PCA do?

Compress: If you want to compress a photo, PCA is a method. It can compress the photo greatly without lossing to much of the definition
Dimension Reduction: Well known in Machine Learning, to lower down the noise, and reduce the features that are unnecessary to get a faster and more accurate result.
Data Visualization: It can take 4 or more variables and make a 2-D PCA plot
Choose Main Variables: It can tell us which variable is the most valuable for clustring the data.

How to use it?

Code:

# PCA
pca = PCA(n_components=310)
X_scaled=pca.fit_transform(X)

# SVD
U,S,V = np.linalg.svd(X.T @ X)
Xrot = np.dot(X_scaled, U[:,:100]) # decorrelate the data```


### SVD
#### How to use SVD to reconstruct a matrix?
row matrix A = U * Sigma * V^T
```python
import numpy as np
A = np.array([[1,2],[3,4],[5,6]])
U, s, V = np.linalg.svd(A)
Sigma = np.zeros((A.shape[0],A.shape[1]))
shape = A.shape[0] if A.shape[0] < A.shape[1] else A.shape[1]
Sigma[:shape,:shape] = np.diag(s)

B = U.dot(Sigma.dot(V)) # or U @ Sigma @ V
print(B)```

#### pseudoinverse(Generalized Inverse)
A^+ means pseudoinverse
A^+ = V * D^+ * U^T
A^+ is pseudoinverse of A, D^+ is pseudoinverse of Sigma, U^T is transponse of U.
We can use U * Sigma * V^T to compute U, Sigma and V to get pseudoinverse.

```python
import numpy as np
A = np.array([[1,2],[3,4],[5,6]])
U, s, V = np.linalg.svd(A)  # ( V is  V.T)
Sigma = np.zeros((A.shape[0],A.shape[1]))
shape = A.shape[0] if A.shape[0] < A.shape[1] else A.shape[1]
Sigma[:shape,:shape] = np.diag(s)
D = Sigma
D[:shape,:shape] = np.diag(1/s)
Dpse = D.T
Apse = V @ Dpse @ U.T
print(Apse)

#or

Apse = np.linalg.svd(A)
print(Apse)

What it can do?

SVD - decomposition

eg: feature numbers(columns) > data numbers(rows)
Then we can choose k biggest Sigma values, and choose the row from V.T, and then reconstruct the matrix: B = U * SigmaNew * VNew

A new dataset(the projection of A): T = U * SigmaNew or T = A * VNew.T

import numpy as np
A = np.array([list(range(1,11)),list(range(11,21)),list(range(21,31))])
U, s, V = np.linalg.svd(A)
Sigma = np.zeros((A.shape[0],A.shape[1]))
shape = A.shape[0] if A.shape[0] < A.shape[1] else A.shape[1]
Sigma[:shape,:shape] = np.diag(s)

# select
n_elements = 2
SigmaNew = Sigma[:, :n_elements]
VNew = V[:n_elements, :]

# reconstruct
B = U @ SigmaNew @ VNew

T = A @ VNew.T # or
T = U @ SigmaNew```

```python
import numpy as np
from sklearn.decomposition import TruncatedSVD
A = np.array([list(range(1,11)),list(range(11,21)),list(range(21,31))])

SVD = TruncatedSVD(n_componets = 2)
SVD.fit(A)
result = SVD.transform(A)
print(result)

PCA And SVD

What can PCA do?

How to use it?

What it can do?

SVD - decomposition

Reference:

You May Also Enjoy

How To Train Multiple Model In One Time With Sklearn --Machine Learning

A Least Squares Approach --Machine Learning

Linear Modeling - A Maximum Likelihood Approach --Machine Learning

The Bayesian Approach To Machine Learning --Machine Learning

Modeling With NMF And SVD --Machine Learning

Feature Engineering For Machine Learning --Machine Learning