在 PCA 之后計算 k-means

Question

我是 R 的新手，我想根據 pca 的結果進行 k-means 聚類。 我確實喜歡這個（以 Iris 數據集為例）：


library(tidyverse)

library(FactoMineR)

library(factoextra)

df <- iris %>%
  select(- Species)

# compute PCA

res.pca <- PCA(df, 
               scale.unit = TRUE, 
               graph = FALSE)

summary(res.pca)

# k-means clustering

kc <- kmeans(res.pca, 3)

然后我收到一個錯誤： storage.mode(x) <- “double”中的錯誤：列表無法自動轉換為“double”。

PCA 的 output 為：

> res.pca
**Results for the Principal Component Analysis (PCA)**
The analysis was performed on 150 individuals, described by 4 variables
*The results are available in the following objects:

   name               description                          
1  "$eig"             "eigenvalues"                        
2  "$var"             "results for the variables"          
3  "$var$coord"       "coord. for the variables"           
4  "$var$cor"         "correlations variables - dimensions"
5  "$var$cos2"        "cos2 for the variables"             
6  "$var$contrib"     "contributions of the variables"     
7  "$ind"             "results for the individuals"        
8  "$ind$coord"       "coord. for the individuals"         
9  "$ind$cos2"        "cos2 for the individuals"           
10 "$ind$contrib"     "contributions of the individuals"   
11 "$call"            "summary statistics"                 
12 "$call$centre"     "mean of the variables"              
13 "$call$ecart.type" "standard error of the variables"    
14 "$call$row.w"      "weights for the individuals"        
15 "$call$col.w"      "weights for the variables"          
> 

> summary(res.pca)

Call:
PCA(X = df, scale.unit = TRUE, graph = FALSE) 


Eigenvalues
                       Dim.1   Dim.2   Dim.3   Dim.4
Variance               2.918   0.914   0.147   0.021
% of var.             72.962  22.851   3.669   0.518
Cumulative % of var.  72.962  95.813  99.482 100.000

Individuals (the 10 first)
                 Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr   cos2  
1            |  2.319 | -2.265  1.172  0.954 |  0.480  0.168  0.043 | -0.128  0.074  0.003 |
2            |  2.202 | -2.081  0.989  0.893 | -0.674  0.331  0.094 | -0.235  0.250  0.011 |
3            |  2.389 | -2.364  1.277  0.979 | -0.342  0.085  0.020 |  0.044  0.009  0.000 |
4            |  2.378 | -2.299  1.208  0.935 | -0.597  0.260  0.063 |  0.091  0.038  0.001 |
5            |  2.476 | -2.390  1.305  0.932 |  0.647  0.305  0.068 |  0.016  0.001  0.000 |
6            |  2.555 | -2.076  0.984  0.660 |  1.489  1.617  0.340 |  0.027  0.003  0.000 |
7            |  2.468 | -2.444  1.364  0.981 |  0.048  0.002  0.000 |  0.335  0.511  0.018 |
8            |  2.246 | -2.233  1.139  0.988 |  0.223  0.036  0.010 | -0.089  0.036  0.002 |
9            |  2.592 | -2.335  1.245  0.812 | -1.115  0.907  0.185 |  0.145  0.096  0.003 |
10           |  2.249 | -2.184  1.090  0.943 | -0.469  0.160  0.043 | -0.254  0.293  0.013 |

Variables
                Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr   cos2  
Sepal.Length |  0.890 27.151  0.792 |  0.361 14.244  0.130 | -0.276 51.778  0.076 |
Sepal.Width  | -0.460  7.255  0.212 |  0.883 85.247  0.779 |  0.094  5.972  0.009 |
Petal.Length |  0.992 33.688  0.983 |  0.023  0.060  0.001 |  0.054  2.020  0.003 |
Petal.Width  |  0.965 31.906  0.931 |  0.064  0.448  0.004 |  0.243 40.230  0.059 |

有人可以幫我解決這個問題嗎？ 我應該在 kmeans() 中放什么而不是 res.pca？ 我不知道應該提取 PCA 結果的哪一部分以在函數 kmeans() 中使用

先感謝您。

Answer 1

主成分分數存儲在res.pca$ind$coord你想要做什么 kmeans 上：

所以我們可以這樣做：

kc <- kmeans(res.pca$ind$coord, 3)
plot(res.pca$ind$coord[,1:2],col=factor(kc$cluster))

Answer 2

似乎kmeans()需要一個數字矩陣作為輸入，但是你給它res.pca這是一個列表。 因此，您會收到錯誤“無法將列表類型的 object 轉換為雙精度”。 “雙”是 R 的 class 到純數字的矩陣或向量。

我不確定 PCA function 輸出什么，因此您必須找到一種方法從中提取 PCA 值，使其成為矩陣，然后運行 kmeans。

希望能幫助到你。

但是為了將來參考，您可以做一些事情來使您的問題更容易得到幫助：

提供一個可重現的例子（一個有幾行的 df）
將錯誤消息翻譯成英文
添加 function 來自的包

在 PCA 之后計算 k-means

問題描述

2 個解決方案

解決方案1
3 2020-04-11 11:10:01

解決方案2
1 2020-04-11 03:18:27

在 PCA 之后計算 k-means

問題描述

2 個解決方案

解決方案1 3 2020-04-11 11:10:01

解決方案2 1 2020-04-11 03:18:27

解決方案1
3 2020-04-11 11:10:01

解決方案2
1 2020-04-11 03:18:27