简体   繁体   中英

R: How to Remove Rows with condition from another Group_By dataframe when Row Count is >1

I have the following sample dataset:

structure(list(Vno = c(1111, 1111, 2222, 3333, 3333, 4444, 5555, 
5555), ID = c("A001", "X011", "B002", "C003", "Y033", "D004", 
"E005", "X055"), Name = c("John", "S/O JJJ", "S/O LLL", "Jane", 
"D/O MMM", "S/O ZZZ", "Nicole", "D/O ZZZ")), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame"))

Output:

> df
# A tibble: 8 x 3
    Vno ID    Name   
  <dbl> <chr> <chr>  
1  1111 A001  John   
2  1111 X011  S/O JJJ
3  2222 B002  S/O LLL
4  3333 C003  Jane   
5  3333 Y033  D/O MMM
6  4444 D004  S/O ZZZ
7  5555 E005  Nicole 
8  5555 X055  D/O ZZZ

What the expected output is to filter out Name which starts with 'S/O' or 'D/O', when the group-by(Vno) count is >1. But, my attempt below had removed even single row with 'S/O' or 'D/O':

pt_byVno <- df %>%
  group_by(Vno) %>%
  filter(!grepl('S/O|D/O',Name)) %>%
  print
    Vno ID    Name  
  <dbl> <chr> <chr> 
1  1111 A001  John  
2  2222 B002  Mark  
3  4444 D004  Nicole

The desired output should be:

# A tibble: 5 x 3
    Vno ID    Name   
  <dbl> <chr> <chr>  
1  1111 A001  John   
2  2222 B002  S/O LLL
3  3333 C003  Jane   
4  4444 D004  S/O ZZZ
5  5555 E005  Nicole 

Appreciate for any R experts help here, thanks!

You can select rows that have only one row in the group or don't have 'S/O|D/O' in them.

library(dplyr)
df %>% group_by(Vno) %>% filter(n() == 1 | !grepl('S/O|D/O', Name))

#    Vno ID    Name   
#  <dbl> <chr> <chr>  
#1  1111 A001  John   
#2  2222 B002  S/O LLL
#3  3333 C003  Jane   
#4  4444 D004  S/O ZZZ
#5  5555 E005  Nicole 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM