[英]subset all rows with the same value in one column, grouped by another column, where at least one row of third column contains a specific letter in R
I am gathering data from a database, where I have two different ID identifiers (ID1, Nr).我正在从数据库中收集数据,其中有两个不同的 ID 标识符(ID1、Nr)。 I want to collect all the rows, that have duplicated ID1, grouped by "Nr", WHERE at least one record in Names has a letter "a".
我想收集所有重复 ID1 的行,按“Nr”分组,其中 Names 中的至少一条记录有一个字母“a”。
df <- data_frame(ID1 = c('100', '100', '100', '100', '100', '100', '100', '100', '100'),
Nr = c('1', '1', '1', '2', '2', '2', '2', '3', '4'),
Names = c('aaa bb', 'aa bbb', 'ccc', 'ccc', 'ccc', 'ddd', 'ccc', 'ccc', 'add'))
So, the desired output would be:因此,所需的 output 将是:
output <- data_frame(ID1= c('100', '100', '100', '100'),
Nr = c('1', '1', '1', '4'),
Names = c('aaa bb', 'aa bbb', 'ccc', 'add'))
Thank you in advance!先感谢您!
You can group_by
Nr
column and use grepl
:您可以
group_by
Nr
列并使用grepl
:
library(dplyr)
df %>% group_by(Nr) %>% filter(any(grepl('a', Names)))
# ID1 Nr Names
# <chr> <chr> <chr>
#1 100 1 aaa bb
#2 100 1 aa bbb
#3 100 1 ccc
#4 100 4 add
The same logic can be implemented in base R相同的逻辑可以在基础 R 中实现
subset(df, ave(grepl('a', Names), Nr, FUN = any))
as well as data.table
:以及
data.table
:
library(data.table)
setDT(df)[, .SD[any(grepl('a', Names))], Nr]
In the orignal dataset if you have more ID
's you might want to include it in group_by
as well.在原始数据集中,如果您有更多
ID
,您可能也希望将其包含在group_by
中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.