简体   繁体   中英

How to label specific data points on a PCA plot in r using ggplot

enter image description here

I want to pick out 5 specific IDs and add labels to them so I can see where they are located on the PCA plot. I have used library(tidyverse. thank you

Without a minimal reproducible dataset it's difficult to know whether this approach will suit your purposes, but perhaps:

install.packages("tidyverse")
install.packages("factoextra")
install.packages("FactoMineR")
library(tidyverse)
library(factoextra)
library(FactoMineR)

data("iris")

# Create a 'label' for every point (NA)
iris$label <- NA
head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species label
#> 1          5.1         3.5          1.4         0.2  setosa    NA
#> 2          4.9         3.0          1.4         0.2  setosa    NA
#> 3          4.7         3.2          1.3         0.2  setosa    NA
#> 4          4.6         3.1          1.5         0.2  setosa    NA
#> 5          5.0         3.6          1.4         0.2  setosa    NA
#> 6          5.4         3.9          1.7         0.4  setosa    NA

# Then 'relabel' the points of interest
iris[2,]$label <- "Label_1"
iris[66,]$label <- "Label_2"
iris[144,]$label <- "Label_3"

head(iris)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   label
#> 1          5.1         3.5          1.4         0.2  setosa    <NA>
#> 2          4.9         3.0          1.4         0.2  setosa Label_1
#> 3          4.7         3.2          1.3         0.2  setosa    <NA>
#> 4          4.6         3.1          1.5         0.2  setosa    <NA>
#> 5          5.0         3.6          1.4         0.2  setosa    <NA>
#> 6          5.4         3.9          1.7         0.4  setosa    <NA>

# Remove species column (5) and label column and scale the data
iris.pca <- PCA(iris[,-c(5,6)], graph = FALSE)

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups") +
  geom_text(aes(label = iris$label))
#> Warning: Removed 147 rows containing missing values (geom_text).


# You can nudge the labels left/right or up/down
fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups") +
  geom_text(aes(label = iris$label), nudge_x = 0.5)
#> Warning: Removed 147 rows containing missing values (geom_text).


#  Or you can use ggrepel
library(ggrepel)

fviz_pca_ind(iris.pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = iris$Species, # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups") +
  geom_text_repel(aes(label = iris$label),
                  box.padding = 5)
#> Warning: Removed 147 rows containing missing values (geom_text_repel).

Created on 2022-07-06 by the reprex package (v2.0.1)

NB The warning "#> Warning: Removed 147 rows containing missing values (geom_text)." relates to the NA's being removed and you can safely ignore it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM