[英]How to collapse levels in a categorical variable in R
I have various categorical variable with more than 5 levels each, I want a function that can collapse them into just two levels我有各种分类变量,每个变量都超过 5 个级别,我想要一个可以将它们折叠成两个级别的函数
column1<- c("bad","good","nice","fair","great","bad","bad","good","nice",
"fair","great","bad")
column2<- c("john","ben","cook","seth","brian","deph","omar","mary",
"frank","boss","kate","sall")
df<- data.frame(column1,column2)
So for the data frame above, in the column1, I want to convert all "bad" to "bad" and other levels to "others" with a function.因此,对于上面的数据框,在 column1 中,我想使用一个函数将所有“坏”转换为“坏”,将其他级别转换为“其他”。 I have no idea how to do that.
我不知道该怎么做。 Thanks
谢谢
Use an ifelse
or case_when
使用
ifelse
或case_when
library(dplyr)
df <- df %>%
mutate(column1 = case_when(column1 != "bad" ~ "others", TRUE ~ column1))
Also, as there is only a single change, we can just do此外,由于只有一个变化,我们可以做
df$column1[df$column1 != "bad"] <- "others"
A simple way to do this in base R is with indexing:在 base R 中执行此操作的一种简单方法是使用索引:
c('others', 'bad')[(df$column1 == 'bad') + 1]
#> [1] "bad" "others" "others" "others" "others" "bad" "bad"
#> [8] "others" "others" "others" "others" "bad"
df<- data.frame(factor=as.factor(column1),column2)
levels(df$factor)<-c("bad",rep("other",4))
Here is dplyr
solution with grouping:这是带分组的
dplyr
解决方案:
library(dplyr)
df %>%
group_by(group = cumsum(column1=="bad")) %>%
mutate(column1 = ifelse(row_number()==1, "bad", "others")) %>%
ungroup() %>%
select(-group)
column1 column2
<chr> <chr>
1 bad john
2 others ben
3 others cook
4 others seth
5 others brian
6 bad deph
7 bad omar
8 others mary
9 others frank
10 others boss
11 others kate
12 bad sall
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.