簡體   English   中英

如何根據 NA 與字母表對值進行分組

[英]How to group values based on NA vs. alphabet

我在LETTER中有一列字母值按字母順序排列,部分穿插有NA

df1 <- data.frame(
  phase = c(NA, "A", "B", "D", NA, "A", "B", "C", "E", "A", "B", "D")
)

LETTER值形成組:從A到下一個NA下一個A任何內容都是一個組。 我想創建一個新列來明確這些組。

預期的結果是這樣的:

df1 <- data.frame(
  phase = c(NA, "A", "B", "D", NA, "A", "B", "C", "E", "A", "B", "D"),
  group = c(NA,"group1","group1","group1",NA, "group2","group2","group2","group2","group3","group3","group3")
)

我怎樣才能創建這個專欄? 我很感激任何建議,基於dplyr或其他。

到目前為止我已經嘗試過的——只取得了部分成功(第三組與第二組沒有被NA分開,被遺漏了):

df1 %>% 
  mutate(group = cumsum(is.na(phase)),
         group = ifelse(is.na(phase), NA, paste("group", group, sep = "")))

   phase  group
1   <NA>   <NA>
2      A group1
3      B group1
4      D group1
5   <NA>   <NA>
6      A group2
7      B group2
8      C group2
9      E group2
10     A group2
11     B group2
12     D group2

如果階段是"A" ,跳轉到下一組。 然后在phaseNA時用NA替換這些組。

library(dplyr)

df1 %>%
  mutate(group = cumsum(phase == "A" & !is.na(phase)) %>%
                 paste0("group", .) %>% 
                 replace(is.na(phase), NA))

#    phase  group
# 1   <NA>   <NA>
# 2      A group1
# 3      B group1
# 4      D group1
# 5   <NA>   <NA>
# 6      A group2
# 7      B group2
# 8      C group2
# 9      E group2
# 10     A group3
# 11     B group3
# 12     D group3

我們也可以做

library(dplyr)
library(stringr)
df1 %>% 
   mutate(group = str_c('group', cumsum(phase %in% 'A') * NA^is.na(phase)))
#  phase  group
#1   <NA>   <NA>
#2      A group1
#3      B group1
#4      D group1
#5   <NA>   <NA>
#6      A group2
#7      B group2
#8      C group2
#9      E group2
#10     A group3
#11     B group3
#12     D group3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM