简体   繁体   English

子集在一列中具有相同值的所有行,按另一列分组,其中第三列的至少一行包含 R 中的特定字母

[英]subset all rows with the same value in one column, grouped by another column, where at least one row of third column contains a specific letter in R

I am gathering data from a database, where I have two different ID identifiers (ID1, Nr).我正在从数据库中收集数据,其中有两个不同的 ID 标识符(ID1、Nr)。 I want to collect all the rows, that have duplicated ID1, grouped by "Nr", WHERE at least one record in Names has a letter "a".我想收集所有重复 ID1 的行,按“Nr”分组,其中 Names 中的至少一条记录有一个字母“a”。

df <- data_frame(ID1 = c('100', '100', '100', '100', '100', '100', '100', '100', '100'),
                 Nr = c('1', '1', '1', '2', '2', '2', '2', '3', '4'),
                 Names = c('aaa bb', 'aa bbb', 'ccc', 'ccc', 'ccc', 'ddd', 'ccc', 'ccc', 'add'))

So, the desired output would be:因此,所需的 output 将是:

output <- data_frame(ID1= c('100', '100', '100', '100'),
                     Nr = c('1', '1', '1', '4'),
                     Names = c('aaa bb', 'aa bbb', 'ccc', 'add'))

Thank you in advance!先感谢您!

You can group_by Nr column and use grepl :您可以group_by Nr列并使用grepl

library(dplyr)
df %>% group_by(Nr) %>% filter(any(grepl('a', Names)))

#  ID1   Nr    Names 
# <chr> <chr> <chr> 
#1 100   1     aaa bb
#2 100   1     aa bbb
#3 100   1     ccc   
#4 100   4     add   

The same logic can be implemented in base R相同的逻辑可以在基础 R 中实现

subset(df, ave(grepl('a', Names), Nr, FUN = any))

as well as data.table :以及data.table

library(data.table)
setDT(df)[, .SD[any(grepl('a', Names))], Nr]

In the orignal dataset if you have more ID 's you might want to include it in group_by as well.在原始数据集中,如果您有更多ID ,您可能也希望将其包含在group_by中。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查看所有列中的至少一列包含零的行 - View all rows where at least one column in a subset of columns contains zero R从数据框中选择所有行,在该数据框中,一个值重复一列,但在另一列中具有特定值 - R select all rows from a dataframe where a value is duplicated one column but has a specific value in another column R 一列中相同值的子集行依赖于另一列中的多个值 - R subset rows of same value in one column dependent on multiple values in another column 如果一列中的某个级别包含R中另一列的所有级别,那么如何提取所有行? - How to extract all the rows if a level in one column contains all the levels of another column in R? 将一行的一个值与同一列的所有行的平均值进行比较 - Comparing the one value for a row to the mean of all rows for the same column 仅当一列包含特定值时删除至少包含 NA 的行 - Remove rows that contain at least an NA only if one column contains a specific value 如果另一列中的内容包含字符串,则R修改一列中的值 - R modify value in one column if content in another column contains string R保留至少有一列大于值的行 - R keep rows with at least one column greater than value R 在某些条件下用同一行中另一列的值替换或保留一列中的值 - R Replace value or keep value in one column with value from another column in the same row with certain conditions 汇总一列,在R中按另一列分组 - Summarize one column, grouped by another in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM