[英]Filtering a specific combination or sequence
也許如下(如果您提供要復制的數據,而不是圖像,這將很有幫助):
mydf <- data.frame(ID = c(804, rep(805, 6), rep(806, 3)),
Application = c(3, 2, rep(3, 4), 4, 2, 3, 3),
Decision = c(LETTERS[1:5], "A", LETTERS[1:3], "E"))
library(dplyr)
library(tidyr)
library(stringr)
mydf |>
group_by(ID, Application) |>
summarize(Decision = paste(Decision, collapse = ",")) |>
ungroup() |>
filter(str_detect(Decision, "C,D,E")) |>
separate_rows(Decision, sep = ",") |>
filter(Decision %in% c("C", "D", "E"))
# A tibble: 3 × 3
ID Application Decision
<dbl> <dbl> <chr>
1 805 3 C
2 805 3 D
3 805 3 E
您可以使用 tidyverse 中的nest()
按ID
和Application
進行分組。 您只需要使用map_lgl()
來處理nest()
創建的列內的數據框列表。 迭代運行以下代碼可能會有所幫助,一次添加一行以查看nest()
的工作原理:
library(tidyverse)
df = tibble(ID = c(804,
rep(805,6),
rep(806,3)),
Application = c(3,2,3,3,3,3,4,2,3,3),
Decision = c('A','B','C','D','E','A','A','B','C','E'))
df %>%
nest(Decision=Decision) %>%
filter(Decision %>%
map_lgl(~grepl("CDE", paste(.$Decision, collapse="")))) %>%
unnest(Decision) %>%
filter(Decision %in% c("C","D","E"))
#> # A tibble: 3 × 3
#> ID Application Decision
#> <dbl> <dbl> <chr>
#> 1 805 3 C
#> 2 805 3 D
#> 3 805 3 E
您可以使用以下 data.table 方式實現它。
require(data.table)
data <- data.table(
ID = c(804, 805, 805, 805, 805, 805, 805, 806, 806, 806),
Application = c(3, 2, 3, 3, 3, 3, 4, 2, 3, 3),
Decision = c("A", "B", "C", "D", "E", "A", "A", "B", "C", "E")
)
# Filter Condition on Decision
cond.Decision <- c("C", "D", "E")
data[
i = ID %in% data[
j = .(flg = all(cond.Decision %in% Decision)),
by = ID
][i = flg == TRUE, j = ID] & Decision %in% cond.Decision
]
ID Application Decision
1: 805 3 C
2: 805 3 D
3: 805 3 E
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.