简体   繁体   English

按两列对数据进行分组和过滤组 (dplyr)

[英]group data and filter groups by two columns (dplyr)

I have a question regarding using dplyr to filter a dataset.我有一个关于使用dplyr过滤数据集的问题。

I want to group data by its RestaurantID and then filter() all groups where the wage >= 5 in Year==1992 .我想按其RestaurantID对数据进行分组,然后filter()Year==1992 wage >= 5所有组。

For example:例如:

I have:我有:

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     2             92          3
     2             93          4
     3             92          5
     3             93          5

Dataset I want (where it keeps all groups that had a wage value in 1992 that was >= 5)我想要的数据集(它保留了 1992 年工资值 >= 5 的所有组)

 RestaurantID     Year        Wage
     1             92          6
     1             93          4
     3             92          5
     3             93          5

I tried:我试过了:

data %>% group_by("RestaurantID") %>% filter(any(Wage>= '5', Year =='92')) 

But this gives me all rows where wage is >=5 .但这给了我工资>=5所有行。

We could do this without grouping using filter我们可以在不使用filter分组的情况下做到这一点

library(dplyr)
df1 %>% 
    filter(RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#3            3   92    5
#4            3   93    5

or the same logic with base R或与base R相同的逻辑

subset(df1, RestaurantID %in% RestaurantID[Year==92 & Wage>= 5])
#   RestaurantID Year Wage
#1            1   92    6
#2            1   93    4
#5            3   92    5
#6            3   93    5

It's ok to have a single TRUE value per ID if you want all rows of that group returned.如果您希望返回该组的所有行,则可以为每个 ID 设置一个 TRUE 值。 In that case, the TRUE value is recycled to the length of that group and hence all rows are returned.在这种情况下,TRUE 值被循环到该组的长度,因此返回所有行。

df %>% group_by(RestaurantID) %>% filter(Wage[Year == 92] >= 5)
## A tibble: 4 x 3
## Groups:   RestaurantID [2]
#  RestaurantID  Year  Wage
#         <int> <int> <int>
#1            1    92     6
#2            1    93     4
#3            3    92     5
#4            3    93     5

Please note that when comparing numbers, you shouldn't put them in quote them like '5' because that turns the numbers into characters.请注意,在比较数字时,您不应该将它们像“5”一样用引号引起来,因为这会将数字转换为字符。

Alternatively, you could modify your original approach to:或者,您可以将原始方法修改为:

df %>% group_by(RestaurantID) %>% filter(any(Wage >= 5 & Year == 92))

which also returns the correct subset.这也返回正确的子集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM