简体   繁体   English

绘制PCA与R中的一维

[英]plot PCA vs one dimension in R

I have a data set with 10 dimension as feature and 1 dimension as cluster number (11 dimension together). 我有一个数据集,其特征为10维,簇号为1维(共11维)。 how can I plot the PCA of my data (PC1) vs cluster number using R? 如何使用R绘制数据(PC1)的PCA与群集号?

qplot(x = not_null_df$TSC_8125, y =  pca, data = subset(not_null_df, select = c (not_null_df$AVG_ERTEBAT,not_null_df$AVG_ROSHD,not_null_df$AVG_HOGHOGH,not_null_df$AVG_MM,not_null_df$AVG_MK,not_null_df$AVG_TM,not_null_df$AVG_VEJHE,not_null_df$AVG_ANGIZEH,not_null_df$AVG_TAHOD)), main = "Loadings for PC1", xlab = "cluster number")

actually I wrote this part of code, and I got this error: 实际上,我编写了这部分代码,但出现了此错误:

Don't know how to automatically pick scale for object of type princomp. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (564): x, y

summary(not_null_df)
     ï..QN           NAMECODE        GENDER      VAZEYATTAAHOL     TAHSILAT          SEN           SABEGHE     
 Min.   :  1.00   Min.   : 1.0   Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.: 28.00   1st Qu.:11.0   1st Qu.:1.000   1st Qu.:1.75   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000  
 Median : 60.00   Median :13.0   Median :1.000   Median :2.00   Median :3.000   Median :1.000   Median :1.000  
 Mean   : 68.63   Mean   :11.7   Mean   :1.152   Mean   :1.75   Mean   :2.578   Mean   :1.394   Mean   :1.121  
 3rd Qu.:103.25   3rd Qu.:14.0   3rd Qu.:1.000   3rd Qu.:2.00   3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:1.000  
 Max.   :190.00   Max.   :16.0   Max.   :2.000   Max.   :2.00   Max.   :3.000   Max.   :3.000   Max.   :3.000  
  AVG_ERTEBAT       AVG_ROSHD       AVG_HOGHOGH         AVG_MM           AVG_MK           AVG_TM         AVG_VEJHE     
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 5.333   1st Qu.: 4.125   1st Qu.: 1.750   1st Qu.: 5.000   1st Qu.: 3.125   1st Qu.: 5.981   1st Qu.: 4.556  
 Median : 7.000   Median : 5.875   Median : 3.500   Median : 7.727   Median : 5.000   Median : 8.000   Median : 6.333  
 Mean   : 6.730   Mean   : 5.787   Mean   : 4.001   Mean   : 6.903   Mean   : 4.890   Mean   : 7.390   Mean   : 6.095  
 3rd Qu.: 8.425   3rd Qu.: 7.656   3rd Qu.: 6.000   3rd Qu.: 9.182   3rd Qu.: 6.688   3rd Qu.: 9.204   3rd Qu.: 7.778  
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
  AVG_ANGIZEH       AVG_TAHOD        AVG_SOALAT        TSC_8125          avg       
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :1.000   Min.   :0.000  
 1st Qu.: 5.000   1st Qu.: 5.833   1st Qu.: 4.000   1st Qu.:1.000   1st Qu.:4.788  
 Median : 7.000   Median : 7.667   Median : 7.000   Median :2.000   Median :6.301  
 Mean   : 6.549   Mean   : 7.171   Mean   : 6.025   Mean   :2.046   Mean   :6.154  
 3rd Qu.: 8.750   3rd Qu.: 9.000   3rd Qu.: 8.000   3rd Qu.:3.000   3rd Qu.:7.599  
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :3.000   Max.   :9.978  

and I can get pca by this code: 我可以通过以下代码获取pca:

pca <- princomp(not_null_df, cor=TRUE, scores=TRUE)

summary(pca)
Importance of components:
                         Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6     Comp.7     Comp.8     Comp.9
Standard deviation     2.887437 1.28937443 1.12619079 1.08816449 0.98432226 0.91257779 0.90980017 0.82303807 0.74435256
Proportion of Variance 0.438805 0.08749929 0.06675293 0.06232116 0.05099423 0.04383149 0.04356507 0.03565219 0.02916109
Cumulative Proportion  0.438805 0.52630426 0.59305720 0.65537835 0.70637258 0.75020406 0.79376914 0.82942133 0.85858242
                          Comp.10    Comp.11    Comp.12    Comp.13    Comp.14    Comp.15   Comp.16    Comp.17     Comp.18
Standard deviation     0.70304085 0.67709130 0.62905993 0.59284646 0.50799135 0.48013732 0.4476952 0.39317004 0.378722707
Proportion of Variance 0.02601402 0.02412909 0.02082718 0.01849826 0.01358185 0.01213325 0.0105490 0.00813593 0.007548994
Cumulative Proportion  0.88459644 0.90872553 0.92955271 0.94805097 0.96163282 0.97376607 0.9843151 0.99245101 1.000000000
                            Comp.19
Standard deviation     1.838143e-08
Proportion of Variance 1.778301e-17
Cumulative Proportion  1.000000e+00

my goal is to plot pca (just Comp.1 ) vs TSC_8125 (that is cluster number) 我的目标是绘制pca(仅Comp.1 )与TSC_8125(即簇号)的关系图

The function princomp() returns a list of 7 elements. 函数princomp()返回7个元素的列表。 These are sdev, loadings, center, scale, n.obs, scores, and call. 这些是sdev,加载,中心,比例,n.obs,得分和调用。 You can find a description of these in the function help page (which you can access by typing ?princomp). 您可以在功能帮助页面(通过键入?princomp进行访问)中找到这些说明。 Depending on the purpose of your plot, the one of interest here is probably scores. 根据情节的目的,这里感兴趣的一项可能是分数。

scores: the scores of the supplied data on the principal components. 分数:提供的主要成分数据的分数。

loadings: the matrix of variable loadings (ie, a matrix whose columns contain the eigenvectors). 负载:可变负载的矩阵(即,其列包含特征向量的矩阵)。

The simplest way to access the elements of the list is via the $ operator. 访问列表元素的最简单方法是通过$运算符。 Thus, pca$scores or pca$loadings will access these, respectively. 因此,pca $ scores或pca $ loadings将分别访问它们。 The scores and loadings are both of class matrix, with each column corresponding to a principle component (first col is the 1st principle component and so on.) 得分和负荷都是类矩阵,每列对应一个主要成分(第一个col是第一个主要成分,依此类推。)

So, to access the 1st principle component scores, you can use 因此,要访问第一个主成分分数,您可以使用

comp.1 <- pca$scores[,1]

to plot this against cluster number you can use 可以针对集群编号进行绘制

plot (comp.1 ~ not_null_df$TSC_8125)

or plot it using qplot if you prefer by 或使用qplot将其绘制,如果您愿意

qplot(x = not_null_df$TSC_8125, y =  comp.1, main = "Scores for PC1", xlab = "cluster number")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM