如何折叠 R 中分类变量的水平

Question

I have various categorical variable with more than 5 levels each, I want a function that can collapse them into just two levels我有各种分类变量，每个变量都超过 5 个级别，我想要一个可以将它们折叠成两个级别的函数

column1<- c("bad","good","nice","fair","great","bad","bad","good","nice",
            "fair","great","bad")
column2<- c("john","ben","cook","seth","brian","deph","omar","mary",
            "frank","boss","kate","sall")

df<- data.frame(column1,column2)

So for the data frame above, in the column1, I want to convert all "bad" to "bad" and other levels to "others" with a function.因此，对于上面的数据框，在 column1 中，我想使用一个函数将所有“坏”转换为“坏”，将其他级别转换为“其他”。 I have no idea how to do that.我不知道该怎么做。 Thanks谢谢

Answer 1

Use an ifelse or case_when使用ifelse或case_when

library(dplyr)
df <- df %>% 
   mutate(column1 = case_when(column1 != "bad" ~ "others", TRUE ~ column1))

Also, as there is only a single change, we can just do此外，由于只有一个变化，我们可以做

df$column1[df$column1 != "bad"] <- "others"

Answer 2

A simple way to do this in base R is with indexing:在 base R 中执行此操作的一种简单方法是使用索引：

c('others', 'bad')[(df$column1 == 'bad') + 1]
#> [1] "bad"    "others" "others" "others" "others" "bad"    "bad"   
#> [8] "others" "others" "others" "others" "bad"

Answer 3

df<- data.frame(factor=as.factor(column1),column2)
levels(df$factor)<-c("bad",rep("other",4))

Answer 4

Here is dplyr solution with grouping:这是带分组的dplyr解决方案：

library(dplyr)
df %>% 
  group_by(group = cumsum(column1=="bad")) %>% 
  mutate(column1 = ifelse(row_number()==1, "bad", "others")) %>% 
  ungroup() %>% 
  select(-group)

  column1 column2
   <chr>   <chr>  
 1 bad     john   
 2 others  ben    
 3 others  cook   
 4 others  seth   
 5 others  brian  
 6 bad     deph   
 7 bad     omar   
 8 others  mary   
 9 others  frank  
10 others  boss   
11 others  kate   
12 bad     sall

如何折叠 R 中分类变量的水平

问题描述

4 个解决方案

解决方案1
3 已采纳 2022-12-13 17:46:48

解决方案2
3 2022-12-13 17:47:16

解决方案3
2 2022-12-13 17:50:06

解决方案4
1 2022-12-13 17:51:57

如何折叠 R 中分类变量的水平

问题描述

4 个解决方案

解决方案1 3 已采纳 2022-12-13 17:46:48

解决方案2 3 2022-12-13 17:47:16

解决方案3 2 2022-12-13 17:50:06

解决方案4 1 2022-12-13 17:51:57

解决方案1
3 已采纳 2022-12-13 17:46:48

解决方案2
3 2022-12-13 17:47:16

解决方案3
2 2022-12-13 17:50:06

解决方案4
1 2022-12-13 17:51:57