需要根据 R 中的字符串值将分类变量重新分组为 5 组

Question

I have a categorical variable with over 1000 levels.我有一个超过 1000 个级别的分类变量。 I want to group levels together so that I can reduce the dimensionality and just have 5 general level.我想将级别分组在一起，这样我就可以降低维度，只有 5 个通用级别。 I want to take the group names and group similar values together.我想使用组名并将相似的值组合在一起。

For example, all levels that contain the word "immune" I want to group into a new group called "immune group".例如，我想将包含“免疫”一词的所有级别分组到一个名为“免疫组”的新组中。 All levels that contain the word "eyes" I want to group into a new group called "eye group", etc.我想将包含“眼睛”一词的所有级别分组到一个名为“眼睛组”等的新组中。

I've tried str_detect and grepl with little success in R. Any other methods that could efficiently do this?我在 R 中尝试过 str_detect 和 grepl 但收效甚微。还有其他方法可以有效地做到这一点吗？

Answer 1

maybe using case_when from dplyr with str_detect .也许将case_when中的 case_when 与str_detect一起使用。 But it would help to have a reproductible example但是有一个可重现的例子会有所帮助

Answer 2

library(dplyr)
library(stringr)
x = c("immune1","immune2","eyes1","eyes2")
case_when(
  str_detect(x,"immune")~"immune group",
  str_detect(x,"eyes")~"eye group",
  T~NA_character_)

需要根据 R 中的字符串值将分类变量重新分组为 5 组

问题描述

2 个解决方案

解决方案1
0 2022-12-15 21:45:08

解决方案2
0 2022-12-15 21:46:38

需要根据 R 中的字符串值将分类变量重新分组为 5 组

问题描述

2 个解决方案

解决方案1 0 2022-12-15 21:45:08

解决方案2 0 2022-12-15 21:46:38

解决方案1
0 2022-12-15 21:45:08

解决方案2
0 2022-12-15 21:46:38