子集在一列中具有相同值的所有行，按另一列分组，其中第三列的至少一行包含 R 中的特定字母

Question

I am gathering data from a database, where I have two different ID identifiers (ID1, Nr).我正在从数据库中收集数据，其中有两个不同的 ID 标识符（ID1、Nr）。 I want to collect all the rows, that have duplicated ID1, grouped by "Nr", WHERE at least one record in Names has a letter "a".我想收集所有重复 ID1 的行，按“Nr”分组，其中 Names 中的至少一条记录有一个字母“a”。

df <- data_frame(ID1 = c('100', '100', '100', '100', '100', '100', '100', '100', '100'),
                 Nr = c('1', '1', '1', '2', '2', '2', '2', '3', '4'),
                 Names = c('aaa bb', 'aa bbb', 'ccc', 'ccc', 'ccc', 'ddd', 'ccc', 'ccc', 'add'))

So, the desired output would be:因此，所需的 output 将是：

output <- data_frame(ID1= c('100', '100', '100', '100'),
                     Nr = c('1', '1', '1', '4'),
                     Names = c('aaa bb', 'aa bbb', 'ccc', 'add'))

Thank you in advance!先感谢您！

Answer 1

You can group_by Nr column and use grepl :您可以group_by Nr列并使用grepl ：

library(dplyr)
df %>% group_by(Nr) %>% filter(any(grepl('a', Names)))

#  ID1   Nr    Names 
# <chr> <chr> <chr> 
#1 100   1     aaa bb
#2 100   1     aa bbb
#3 100   1     ccc   
#4 100   4     add

The same logic can be implemented in base R相同的逻辑可以在基础 R 中实现

subset(df, ave(grepl('a', Names), Nr, FUN = any))

as well as data.table :以及data.table ：

library(data.table)
setDT(df)[, .SD[any(grepl('a', Names))], Nr]

In the orignal dataset if you have more ID 's you might want to include it in group_by as well.在原始数据集中，如果您有更多ID ，您可能也希望将其包含在group_by中。

子集在一列中具有相同值的所有行，按另一列分组，其中第三列的至少一行包含 R 中的特定字母

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-18 11:26:11

子集在一列中具有相同值的所有行，按另一列分组，其中第三列的至少一行包含 R 中的特定字母

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-18 11:26:11

解决方案1
1 已采纳 2020-06-18 11:26:11