简体   繁体   English

需要根据 R 中的字符串值将分类变量重新分组为 5 组

[英]Need to regroup categorical variable into just 5 groups based on string value in R

I have a categorical variable with over 1000 levels.我有一个超过 1000 个级别的分类变量。 I want to group levels together so that I can reduce the dimensionality and just have 5 general level.我想将级别分组在一起,这样我就可以降低维度,只有 5 个通用级别。 I want to take the group names and group similar values together.我想使用组名并将相似的值组合在一起。

For example, all levels that contain the word "immune" I want to group into a new group called "immune group".例如,我想将包含“免疫”一词的所有级别分组到一个名为“免疫组”的新组中。 All levels that contain the word "eyes" I want to group into a new group called "eye group", etc.我想将包含“眼睛”一词的所有级别分组到一个名为“眼睛组”等的新组中。

I've tried str_detect and grepl with little success in R. Any other methods that could efficiently do this?我在 R 中尝试过 str_detect 和 grepl 但收效甚微。还有其他方法可以有效地做到这一点吗?

maybe using case_when from dplyr with str_detect .也许将case_when中的 case_when 与str_detect一起使用。 But it would help to have a reproductible example但是有一个可重现的例子会有所帮助

library(dplyr)
library(stringr)
x = c("immune1","immune2","eyes1","eyes2")
case_when(
  str_detect(x,"immune")~"immune group",
  str_detect(x,"eyes")~"eye group",
  T~NA_character_)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM