简体   繁体   English

PCA使用sklearn

[英]PCA using sklearn

I have a large input matrix, size (20, 20000) and am trying to perform PCA using the sklearn Python package. 我有一个大的输入矩阵,大小(20,20000),我正在尝试使用sklearn Python包执行PCA。 Here, 20 refers to 20 subjects, and 20,000 refers to 20,000 features. 这里,20表示20个科目,20,000表示20,000个科目。 Below is sample code: 以下是示例代码:

import numpy as np
from sklearn.decomposition import PCA

rng = np.random.RandomState(1)
X = rng.randn(20, 20000)
pca.fit(X)
X.shape = 

>> (20, 20000)

pca = PCA(n_components=21)
pca.fit(X)
X_pca = pca.transform(X)
print("Original shape: ", X.shape)
print("Transformed shape: ", X_pca.shape)

>> Original shape: (20, 20000)
>> Transformed shape: (20, 20)

Using PCA, am I not able to get back more components than my number of x values(why are we limited by the length of our x-values when we obtain pca components)? 使用PCA,我无法获得比x值更多的组件(当我们获得pca组件时,为什么我们受x值的长度限制)?

The PCA implementation performs a singular value decomposition in order to identify the singular values associated with the principal directional components. PCA实现执行奇异值分解以便识别与主要方向分量相关联的奇异值。 In your case this singular value matrix is a 20x20000 rectangular diagonal matrix of which you can have at most 20 components. 在您的情况下,这个奇异值矩阵是一个20x20000 矩形对角矩阵 ,您最多可以有20个组件。

This has more to do with the PCA implementation than sklearn, but: 这与PCA实现有关,而不是sklearn,但是:

if n_samples <= n_features:
    maxn_pc = n_samples - 1
else:
    maxn_pc = n_features

Namely, if your number of samples (n) is less than or equal the number of features (f), the greatest number of non-trivial components you can extract is n-1. 也就是说,如果样本数(n)小于或等于特征数(f),则可以提取的非平凡组件的最大数量为n-1。 Otherwise, the greatest number of non-trivial components is n. 否则,最大数量的非平凡组件是n。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM