[英]How to collapse levels in a categorical variable in R
我有各種分類變量,每個變量都超過 5 個級別,我想要一個可以將它們折疊成兩個級別的函數
column1<- c("bad","good","nice","fair","great","bad","bad","good","nice",
"fair","great","bad")
column2<- c("john","ben","cook","seth","brian","deph","omar","mary",
"frank","boss","kate","sall")
df<- data.frame(column1,column2)
因此,對於上面的數據框,在 column1 中,我想使用一個函數將所有“壞”轉換為“壞”,將其他級別轉換為“其他”。 我不知道該怎么做。 謝謝
使用ifelse
或case_when
library(dplyr)
df <- df %>%
mutate(column1 = case_when(column1 != "bad" ~ "others", TRUE ~ column1))
此外,由於只有一個變化,我們可以做
df$column1[df$column1 != "bad"] <- "others"
在 base R 中執行此操作的一種簡單方法是使用索引:
c('others', 'bad')[(df$column1 == 'bad') + 1]
#> [1] "bad" "others" "others" "others" "others" "bad" "bad"
#> [8] "others" "others" "others" "others" "bad"
df<- data.frame(factor=as.factor(column1),column2)
levels(df$factor)<-c("bad",rep("other",4))
這是帶分組的dplyr
解決方案:
library(dplyr)
df %>%
group_by(group = cumsum(column1=="bad")) %>%
mutate(column1 = ifelse(row_number()==1, "bad", "others")) %>%
ungroup() %>%
select(-group)
column1 column2
<chr> <chr>
1 bad john
2 others ben
3 others cook
4 others seth
5 others brian
6 bad deph
7 bad omar
8 others mary
9 others frank
10 others boss
11 others kate
12 bad sall
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.