Using R, I am trying to filter the ID's that have a specific "Decision" sequence, but the have to be in the same "Application" number. The "Decision" order needs to be C,D,E. So What I am looking for in here, is to get
Because The ID "805" has the "Decision" sequence C,D,E and they are in the same "Application" number.
I tried using, for loops, if else, filter but nothing worked for me. Thank you in advance for your help.
Maybe the following (it would be helpful if you provide the data to reproduce, as opposed to an image):
mydf <- data.frame(ID = c(804, rep(805, 6), rep(806, 3)),
Application = c(3, 2, rep(3, 4), 4, 2, 3, 3),
Decision = c(LETTERS[1:5], "A", LETTERS[1:3], "E"))
library(dplyr)
library(tidyr)
library(stringr)
mydf |>
group_by(ID, Application) |>
summarize(Decision = paste(Decision, collapse = ",")) |>
ungroup() |>
filter(str_detect(Decision, "C,D,E")) |>
separate_rows(Decision, sep = ",") |>
filter(Decision %in% c("C", "D", "E"))
# A tibble: 3 × 3
ID Application Decision
<dbl> <dbl> <chr>
1 805 3 C
2 805 3 D
3 805 3 E
You could use nest()
from the tidyverse to group by ID
and Application
. You just need to use map_lgl()
to work with the list of dataframes inside the column that nest()
creates. It might help to run the following code iteratively, adding one line at a time to see how nest()
works:
library(tidyverse)
df = tibble(ID = c(804,
rep(805,6),
rep(806,3)),
Application = c(3,2,3,3,3,3,4,2,3,3),
Decision = c('A','B','C','D','E','A','A','B','C','E'))
df %>%
nest(Decision=Decision) %>%
filter(Decision %>%
map_lgl(~grepl("CDE", paste(.$Decision, collapse="")))) %>%
unnest(Decision) %>%
filter(Decision %in% c("C","D","E"))
#> # A tibble: 3 × 3
#> ID Application Decision
#> <dbl> <dbl> <chr>
#> 1 805 3 C
#> 2 805 3 D
#> 3 805 3 E
You can achieve it using below data.table way.
require(data.table)
data <- data.table(
ID = c(804, 805, 805, 805, 805, 805, 805, 806, 806, 806),
Application = c(3, 2, 3, 3, 3, 3, 4, 2, 3, 3),
Decision = c("A", "B", "C", "D", "E", "A", "A", "B", "C", "E")
)
# Filter Condition on Decision
cond.Decision <- c("C", "D", "E")
data[
i = ID %in% data[
j = .(flg = all(cond.Decision %in% Decision)),
by = ID
][i = flg == TRUE, j = ID] & Decision %in% cond.Decision
]
ID Application Decision
1: 805 3 C
2: 805 3 D
3: 805 3 E
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.