简体   繁体   English

numpy和sklearn上PCA,truncated_svd和svds的结果不同

[英]different results for PCA, truncated_svd and svds on numpy and sklearn

In sklearn an numpy there are different ways to compute the first principal component. 在numpy中,有多种方法可以计算第一个主成分。 I obtain a different results for each method. 对于每种方法,我得到不同的结果。 Why? 为什么?

import matplotlib.pyplot as pl
from sklearn import decomposition
import scipy as sp
import sklearn.preprocessing
import numpy as np
import sklearn as sk

def gen_data_3_1():
    #### generate the data 3.1
    m=1000 # number of samples
    n=10 # number of variables
    d1=np.random.normal(loc=0,scale=100,size=(m,1))
    d2=np.random.normal(loc=0,scale=121,size=(m,1))
    d3=-0.2*d1+0.9*d2
    z=np.zeros(shape=(m,1))

    for i in range(4):
        z=np.hstack([z,d1+np.random.normal(size=(m,1))])

    for i in range(4):
        z=np.hstack([z,d2+np.random.normal(size=(m,1))])
    for i in range(2):
        z=np.hstack([z,d3+np.random.normal(size=(m,1))])
    z=z[:,1:11]  
    z=sk.preprocessing.scale(z,axis=0)
    return z

x=gen_data_3_1() #generate the sample dataset

x=sk.preprocessing.scale(x) #normalize the data
pca=sk.decomposition.PCA().fit(x) #compute the PCA of x and print the first princ comp.
print "first pca components=",pca.components_[:,0]
u,s,v=sp.sparse.linalg.svds(x) # the first column of v.T is the first princ comp
print "first svd components=",v.T[:,0]

trsvd=sk.decomposition.TruncatedSVD(n_components=3).fit(x) #the first components is the                          
                                                           #first princ comp
print "first component TruncatedSVD=",trsvd.components_[0,]

-- -

   first pca components= [-0.04201262  0.49555992  0.53885401 -0.67007959  0.0217131  -0.02535204
      0.03105254 -0.07313795 -0.07640555 -0.00442718]
    first svd components= [ 0.02535204 -0.1317925   0.12071112 -0.0323422   0.20165568 -0.25104996
     -0.0278177   0.17856688 -0.69344318  0.59089451]
    first component TruncatedSVD= [-0.04201262 -0.04230353 -0.04213402 -0.04221069  0.4058159   0.40584108
      0.40581564  0.40584842  0.40872029  0.40870925]

Because the methods PCA, SVD, and truncated SVD are not the same. 因为方法PCA,SVD和截断的SVD不相同。 PCA calls SVD, but it also centers data before. PCA调用SVD,但它也将数据居中。 Truncated SVD truncates the vectors. 截断的SVD会截断向量。 svds is a different method from svd as it is sparse. svds是与svd不同的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM