简体   繁体   中英

Filtering a specific combination or sequence

过滤决策

Using R, I am trying to filter the ID's that have a specific "Decision" sequence, but the have to be in the same "Application" number. The "Decision" order needs to be C,D,E. So What I am looking for in here, is to get

最终过滤器

Because The ID "805" has the "Decision" sequence C,D,E and they are in the same "Application" number.

I tried using, for loops, if else, filter but nothing worked for me. Thank you in advance for your help.

Maybe the following (it would be helpful if you provide the data to reproduce, as opposed to an image):

mydf <- data.frame(ID = c(804, rep(805, 6), rep(806, 3)),
           Application = c(3, 2, rep(3, 4), 4, 2, 3, 3),
           Decision = c(LETTERS[1:5], "A", LETTERS[1:3], "E"))

library(dplyr)
library(tidyr)
library(stringr)

mydf |> 
  group_by(ID, Application) |> 
  summarize(Decision = paste(Decision, collapse = ",")) |> 
  ungroup() |> 
  filter(str_detect(Decision, "C,D,E")) |> 
  separate_rows(Decision, sep = ",") |> 
  filter(Decision %in% c("C", "D", "E"))

# A tibble: 3 × 3
     ID Application Decision
  <dbl>       <dbl> <chr>   
1   805           3 C       
2   805           3 D       
3   805           3 E  

You could use nest() from the tidyverse to group by ID and Application . You just need to use map_lgl() to work with the list of dataframes inside the column that nest() creates. It might help to run the following code iteratively, adding one line at a time to see how nest() works:

library(tidyverse)
df = tibble(ID = c(804,
                   rep(805,6),
                   rep(806,3)),
            Application = c(3,2,3,3,3,3,4,2,3,3),
            Decision = c('A','B','C','D','E','A','A','B','C','E'))


df %>% 
  nest(Decision=Decision) %>% 
  filter(Decision %>% 
           map_lgl(~grepl("CDE", paste(.$Decision, collapse="")))) %>% 
  unnest(Decision) %>% 
  filter(Decision %in% c("C","D","E"))

#> # A tibble: 3 × 3
#>      ID Application Decision
#>   <dbl>       <dbl> <chr>   
#> 1   805           3 C       
#> 2   805           3 D       
#> 3   805           3 E

You can achieve it using below data.table way.

require(data.table)

data <- data.table(
  ID = c(804, 805, 805, 805, 805, 805, 805, 806, 806, 806),
  Application = c(3, 2, 3, 3, 3, 3, 4, 2, 3, 3),
  Decision = c("A", "B", "C", "D", "E", "A", "A", "B", "C", "E")
)

# Filter Condition on Decision
cond.Decision <- c("C", "D", "E")

data[
  i = ID %in% data[
    j = .(flg = all(cond.Decision %in% Decision)),
    by = ID
  ][i = flg == TRUE, j = ID] & Decision %in% cond.Decision
]
    ID Application Decision
1: 805           3        C
2: 805           3        D
3: 805           3        E

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM