简体   繁体   English

如何折叠 R 中分类变量的水平

[英]How to collapse levels in a categorical variable in R

I have various categorical variable with more than 5 levels each, I want a function that can collapse them into just two levels我有各种分类变量,每个变量都超过 5 个级别,我想要一个可以将它们折叠成两个级别的函数

column1<- c("bad","good","nice","fair","great","bad","bad","good","nice",
            "fair","great","bad")
column2<- c("john","ben","cook","seth","brian","deph","omar","mary",
            "frank","boss","kate","sall")

df<- data.frame(column1,column2)

So for the data frame above, in the column1, I want to convert all "bad" to "bad" and other levels to "others" with a function.因此,对于上面的数据框,在 column1 中,我想使用一个函数将所有“坏”转换为“坏”,将其他级别转换为“其他”。 I have no idea how to do that.我不知道该怎么做。 Thanks谢谢

Use an ifelse or case_when使用ifelsecase_when

library(dplyr)
df <- df %>% 
   mutate(column1 = case_when(column1 != "bad" ~ "others", TRUE ~ column1))

Also, as there is only a single change, we can just do此外,由于只有一个变化,我们可以做

df$column1[df$column1 != "bad"] <- "others"

A simple way to do this in base R is with indexing:在 base R 中执行此操作的一种简单方法是使用索引:

c('others', 'bad')[(df$column1 == 'bad') + 1]
#> [1] "bad"    "others" "others" "others" "others" "bad"    "bad"   
#> [8] "others" "others" "others" "others" "bad"  
df<- data.frame(factor=as.factor(column1),column2)
levels(df$factor)<-c("bad",rep("other",4))

Here is dplyr solution with grouping:这是带分组的dplyr解决方案:

library(dplyr)
df %>% 
  group_by(group = cumsum(column1=="bad")) %>% 
  mutate(column1 = ifelse(row_number()==1, "bad", "others")) %>% 
  ungroup() %>% 
  select(-group)

  column1 column2
   <chr>   <chr>  
 1 bad     john   
 2 others  ben    
 3 others  cook   
 4 others  seth   
 5 others  brian  
 6 bad     deph   
 7 bad     omar   
 8 others  mary   
 9 others  frank  
10 others  boss   
11 others  kate   
12 bad     sall   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM