简体   繁体   English

如何使用 ggplot 在 r 中的 PCA 图上标记特定数据点

[英]How to label specific data points on a PCA plot in r using ggplot

enter image description here在此处输入图像描述

I want to pick out 5 specific IDs and add labels to them so I can see where they are located on the PCA plot.我想挑选 5 个特定的 ID 并为它们添加标签,以便我可以看到它们在 PCA 图上的位置。 I have used library(tidyverse. thank you我用过图书馆(tidyverse。谢谢

Without a minimal reproducible dataset it's difficult to know whether this approach will suit your purposes, but perhaps:如果没有最小的可重现数据集,很难知道这种方法是否适合您的目的,但也许:

install.packages("tidyverse")
install.packages("factoextra")
install.packages("FactoMineR")
library(tidyverse)
library(factoextra)
library(FactoMineR)

data("iris")

# Create a 'label' for every point (NA)
iris$label <- NA
head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species label
#> 1          5.1         3.5          1.4         0.2  setosa    NA
#> 2          4.9         3.0          1.4         0.2  setosa    NA
#> 3          4.7         3.2          1.3         0.2  setosa    NA
#> 4          4.6         3.1          1.5         0.2  setosa    NA
#> 5          5.0         3.6          1.4         0.2  setosa    NA
#> 6          5.4         3.9          1.7         0.4  setosa    NA

# Then 'relabel' the points of interest
iris[2,]$label <- "Label_1"
iris[66,]$label <- "Label_2"
iris[144,]$label <- "Label_3"

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   label
#> 1          5.1         3.5          1.4         0.2  setosa    <NA>
#> 2          4.9         3.0          1.4         0.2  setosa Label_1
#> 3          4.7         3.2          1.3         0.2  setosa    <NA>
#> 4          4.6         3.1          1.5         0.2  setosa    <NA>
#> 5          5.0         3.6          1.4         0.2  setosa    <NA>
#> 6          5.4         3.9          1.7         0.4  setosa    <NA>

# Remove species column (5) and label column and scale the data
iris.pca <- PCA(iris[,-c(5,6)], graph = FALSE)

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups") +
  geom_text(aes(label = iris$label))
#> Warning: Removed 147 rows containing missing values (geom_text).


# You can nudge the labels left/right or up/down
fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups") +
  geom_text(aes(label = iris$label), nudge_x = 0.5)
#> Warning: Removed 147 rows containing missing values (geom_text).


#  Or you can use ggrepel
library(ggrepel)

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups") +
  geom_text_repel(aes(label = iris$label),
                  box.padding = 5)
#> Warning: Removed 147 rows containing missing values (geom_text_repel).

Created on 2022-07-06 by the reprex package (v2.0.1)reprex 包(v2.0.1) 创建于 2022-07-06

NB The warning "#> Warning: Removed 147 rows containing missing values (geom_text)."注意警告“#> 警告:删除了 147 行包含缺失值 (geom_text)。” relates to the NA's being removed and you can safely ignore it.与 NA 被删除有关,您可以放心地忽略它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM