簡體   English   中英

使用箱線圖檢查哪些是區分集群的特征

[英]Check which are the features that differentiate between clusters, using a boxplot

我對我的data應用了 UMAP 降維,並將其聚類。 我得到了三個不同的集群:

在此處輸入圖像描述

我有指定 eahc 樣本屬於哪個集群的數據,以及樣本的名稱和所有內容。 這是它的一個子樣本,我們稱之為df_cluster

structure(list(X1 = c(17.6942795910888, 16.5328416912875, 15.0031683863395, 
16.3550118351627, 17.6931159161312, 16.9869249394253, 16.3790173297882, 
15.8964870189374, 17.1055608092973, 16.4568632337052), X2 = c(-1.64953541728691, 
0.185674946464158, -1.38521677790428, -0.448487127519734, -1.63670327964466, 
-0.456667476792068, -0.091689040488956, -1.77486494294163, -1.86407675524967, 
0.14666260432486), cluster = c(1L, 2L, 2L, 1L, 2L, 1L, 3L, 3L, 
1L, 3L)), row.names = c("Patient1", "Patient13", "Patient2", "Patient99", 
"Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
), class = "data.frame")

df_cluster的樣本與我用於聚類的原始數據data相同。 這基本上只是您看到的行樣本和列特征,看起來像這樣:

structure(c(-0.0741098696855045, -0.094401270881699, 0.0410284948786532, 
-0.163302950330185, -0.0942478217207681, -0.167314411991775, 
-0.118272811489486, -0.0366277340916379, -0.0349008907108641, 
-0.167823357941815, -0.178835447722468, -0.253897294559596, -0.0372301980787381, 
-0.230579110769457, -0.224125346052727, -0.196933050675633, -0.344608041139497, 
-0.0550538743643369, -0.157003425700701, -0.162295446209879, 
-0.0384421660291032, -0.0275306107582565, 0.186447606591857, 
-0.124972070102036, -0.15348122673842, -0.106812144494277, -0.104757782473888, 
0.0686746776877563, -0.0662055287009653, 0.00388752358937872), dim = c(10L, 
3L), dimnames = list(c("Patient1", "Patient13", "Patient2", "Patient99", 
"Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
), c("Feature1", "Feature2", 
"Feature3")))

我只想查看每個集群中的每個功能( data列),使用盒子 plot 或小提琴 plot。集群之間的比較。

所以在 X 軸上,我將有集群 1、2 和 3,Y 軸將是值。 每個特征都會得到一個 plot。為了更清楚,我手繪了一個例子:

在此處輸入圖像描述

你可以使用方面。

但首先你需要pivot dataframe。

df_cluster <- structure(list(X1 = c(17.6942795910888, 16.5328416912875, 15.0031683863395, 
16.3550118351627, 17.6931159161312, 16.9869249394253, 16.3790173297882, 
15.8964870189374, 17.1055608092973, 16.4568632337052), X2 = c(-1.64953541728691, 
0.185674946464158, -1.38521677790428, -0.448487127519734, -1.63670327964466, 
-0.456667476792068, -0.091689040488956, -1.77486494294163, -1.86407675524967, 
0.14666260432486), cluster = c(1L, 2L, 2L, 1L, 2L, 1L, 3L, 3L, 
1L, 3L)), row.names = c("Patient1", "Patient13", "Patient2", "Patient99", 
"Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
), class = "data.frame")

data <- structure(c(-0.0741098696855045, -0.094401270881699, 0.0410284948786532, 
                   -0.163302950330185, -0.0942478217207681, -0.167314411991775, 
                   -0.118272811489486, -0.0366277340916379, -0.0349008907108641, 
                   -0.167823357941815, -0.178835447722468, -0.253897294559596, -0.0372301980787381, 
                   -0.230579110769457, -0.224125346052727, -0.196933050675633, -0.344608041139497, 
                   -0.0550538743643369, -0.157003425700701, -0.162295446209879, 
                   -0.0384421660291032, -0.0275306107582565, 0.186447606591857, 
                   -0.124972070102036, -0.15348122673842, -0.106812144494277, -0.104757782473888, 
                   0.0686746776877563, -0.0662055287009653, 0.00388752358937872), dim = c(10L, 
                                                                                          3L), dimnames = list(c("Patient1", "Patient13", "Patient2", "Patient99", 
                                                                                                                 "Patient10", "Patient43", "Patient167", "Patient8", "Patient17", "Patient16"
                                                                                          ), c("Feature1", "Feature2", 
                                                                                               "Feature3")))



library(tidyverse)

data %>% 
  as.data.frame() %>% 
  rownames_to_column("Patient") %>% 
  left_join(df_cluster %>% rownames_to_column("Patient") %>% select(Patient, cluster)) %>% 
  pivot_longer(- c(cluster, Patient)) %>% #Pivot the dataframe
  ggplot(aes(as.factor(cluster), value)) +
  geom_boxplot() +
  facet_grid(~ name)

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM