[英]How to group values based on NA vs. alphabet
我在LETTER
中有一列字母值按字母順序排列,部分穿插有NA
:
df1 <- data.frame(
phase = c(NA, "A", "B", "D", NA, "A", "B", "C", "E", "A", "B", "D")
)
LETTER
值形成組:從A
到下一個NA
或下一個A
的任何內容都是一個組。 我想創建一個新列來明確這些組。
預期的結果是這樣的:
df1 <- data.frame(
phase = c(NA, "A", "B", "D", NA, "A", "B", "C", "E", "A", "B", "D"),
group = c(NA,"group1","group1","group1",NA, "group2","group2","group2","group2","group3","group3","group3")
)
我怎樣才能創建這個專欄? 我很感激任何建議,基於dplyr
或其他。
到目前為止我已經嘗試過的——只取得了部分成功(第三組與第二組沒有被NA
分開,被遺漏了):
df1 %>%
mutate(group = cumsum(is.na(phase)),
group = ifelse(is.na(phase), NA, paste("group", group, sep = "")))
phase group
1 <NA> <NA>
2 A group1
3 B group1
4 D group1
5 <NA> <NA>
6 A group2
7 B group2
8 C group2
9 E group2
10 A group2
11 B group2
12 D group2
如果階段是"A"
,跳轉到下一組。 然后在phase
為NA
時用NA
替換這些組。
library(dplyr)
df1 %>%
mutate(group = cumsum(phase == "A" & !is.na(phase)) %>%
paste0("group", .) %>%
replace(is.na(phase), NA))
# phase group
# 1 <NA> <NA>
# 2 A group1
# 3 B group1
# 4 D group1
# 5 <NA> <NA>
# 6 A group2
# 7 B group2
# 8 C group2
# 9 E group2
# 10 A group3
# 11 B group3
# 12 D group3
我們也可以做
library(dplyr)
library(stringr)
df1 %>%
mutate(group = str_c('group', cumsum(phase %in% 'A') * NA^is.na(phase)))
# phase group
#1 <NA> <NA>
#2 A group1
#3 B group1
#4 D group1
#5 <NA> <NA>
#6 A group2
#7 B group2
#8 C group2
#9 E group2
#10 A group3
#11 B group3
#12 D group3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.