简体   繁体   English

Matlab和Python对PCA产生不同的结果

[英]Matlab and Python produces different results for PCA

I am using PCA and I found PCA in sklearn in Python and pca() in Matlab produce different results. 我正在使用PCA,并且在python中的sklearn中发现了PCA,在Matlab中的pca()产生了不同的结果。 Here are the test matrix I am using. 这是我正在使用的测试矩阵。

a = np.array([[-1,-1], [-2,-1], [-3, -2], [1,1], [2,1], [3,2]])

For Python sklearn, I got 对于Python sklearn,我得到了

p = PCA()
print(p.fit_transform(a))

[[-1.38340578  0.2935787 ]
[-2.22189802 -0.25133484]
[-3.6053038   0.04224385]
[ 1.38340578 -0.2935787 ]
[ 2.22189802  0.25133484]
[ 3.6053038  -0.04224385]]

For Matlab, I got 对于Matlab,我得到了

pca(a', 'Centered', false)

[0.2196    0.5340
0.3526   -0.4571
0.5722    0.0768
-0.2196   -0.5340
-0.3526    0.4571
-0.5722   -0.0768]

Why is such difference observed? 为什么会观察到这种差异?


Thanks for the answer of Dan. 感谢Dan的回答。 The results look quite reasonable now. 现在的结果看起来很合理。 However if I test with a random matrix, it seems that Matlab and Python are producing results that are not scalar multiple of each other. 但是,如果我使用随机矩阵进行测试,则看来Matlab和Python产生的结果不是标量的倍数。 Why this happens? 为什么会这样?

test matrix a:

[[ 0.36671885  0.77268624  0.94687497]
[ 0.75741855  0.63457672  0.88671836]
[ 0.20818031  0.709373    0.45114135]
[ 0.24488718  0.87400025  0.89382836]
[ 0.16554686  0.74684393  0.08551401]
[ 0.07371664  0.1632872   0.84217978]]

Python results: Python结果:

p = PCA()
p.fit_transform(a))

[[ 0.25305509 -0.10189215 -0.11661895]
[ 0.36137036 -0.20480169  0.27455458]
[-0.25638649 -0.02923213 -0.01619661]
[ 0.14741593 -0.12777308 -0.2434731 ]
[-0.6122582  -0.08568121  0.06790961]
[ 0.10680331  0.54938026  0.03382447]]

Matlab results: Matlab结果:

pca(a', 'Centered', false)

0.504156973865138   -0.0808159771243340 -0.107296852182663
0.502756555190181   -0.174432053627297  0.818826939851221
0.329948209311847   0.315668718703861   -0.138813345638127
0.499181592718705   0.0755364557146097  -0.383301081533716
0.232039797509016   0.694464307249012   -0.0436361728092353
0.284905319274925   -0.612706345940607  -0.387190971583757

Thanks for the help of Dan all through this. 感谢Dan的所有帮助。 In fact I found it's a misuse of Matlab functions. 实际上,我发现这是对Matlab函数的滥用。 Matlab returns principal components coefficients by default. Matlab默认返回主成分系数。 Using [~, score] = pca(a, 'Centered', true) will get the same results as Python. 使用[〜,score] = pca(a,'Centered',true)将获得与Python相同的结果。

PCA works off Eigen vectors. PCA使用本征向量。 So long as the vectors are parallel, the magnitude is irrelevant (just a different normalizaton). 只要矢量是平行的,幅度就无关紧要(只是不同的归一化)。

In your case, the two are scalar multiples of each other. 在您的情况下,两者是彼此的标量倍数。 Try (in MATLAB) 试试(在MATLAB中)

Python = [-1.38340578  0.2935787 
          -2.22189802 -0.25133484
          3.6053038   0.04224385
          1.38340578 -0.2935787 
          2.22189802  0.25133484
          3.6053038  -0.04224385]

Matlab = [ 0.2196    0.5340
           0.3526   -0.4571
           0.5722    0.0768
          -0.2196   -0.5340
          -0.3526    0.4571
          -0.5722   -0.0768]

Now notice that B(:,1)*-6.2997 is basically equal to A(:,1) . 现在注意B(:,1)*-6.2997基本等于A(:,1) Or put another way 或者换种方式

A(:,n)./B(:,n)

gives you (roughly) the same number for each row. 给您(大约)每行相同的数字。 This means the two vectors have the same direction (ie they are just scalar multiples of each other) and so you are getting the same principal components. 这意味着两个向量具有相同的方向(即它们只是彼此的标量倍数),因此您将获得相同的主成分。

See here for another example: https://math.stackexchange.com/a/1183707/118848 请参阅此处以获取另一个示例: https : //math.stackexchange.com/a/1183707/118848

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM