[英]Filter rows within groups based on multiple conditions
I have a data set where I would like to filter rows within different groups. 我有一个数据集,我想在其中过滤不同组中的行。
Given this dataframe: 给定此数据框:
group = as.factor(c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3))
fruit = as.factor(c("apples", "apples", "apples", "oranges",
"oranges", "apples", "oranges",
"bananas", "bananas", "oranges", "bananas"))
hit = c(1, 0, 1, 1,
0, 1, 1,
1, 0, 0, 1)
dt = data.frame(group, fruit, hit)
dt
group fruit hit
1 apples 1
1 apples 0
1 apples 1
1 oranges 1
2 oranges 0
2 apples 1
2 oranges 1
3 bananas 1
3 bananas 0
3 oranges 0
3 bananas 1
I would like to use the first occurrence of fruit
within a group to filter the groups. 我想使用组中第一次出现的
fruit
来过滤组。 But there is another condition, I would only like keep the rows of that fruit where the hit
is equal to 1
. 但是还有另一个条件,我只想保留
hit
等于1
的那排水果。
So, for group 1
, apples
is the first occurrence, and it has two times a positive hit, thus II would like to keep those two rows. 因此,对于
group 1
, apples
是第一个出现的apples
,它有两次成功命中率,因此II希望保留这两行。
The result would look like this: 结果将如下所示:
group fruit hit
1 apples 1
1 apples 1
2 oranges 1
3 bananas 1
3 bananas 1
I know you can filter with dplyr
but I am not sure I can achieve this. 我知道您可以使用
dplyr
进行过滤,但是我不确定我能否实现这一目标。
We can use dplyr
. 我们可以使用
dplyr
。 After grouping by 'group', filter
the rows that have 'hit' not equal to 0 and ( &
) the 'fruit' as the first
element of 'fruit' 按“分组”分组后,
filter
“匹配”不等于0且( &
)将“水果”作为“水果”的first
元素的行
library(dplyr)
dt %>%
group_by(group) %>%
filter(hit!=0 & fruit == first(fruit))
# group fruit hit
# <fctr> <fctr> <dbl>
#1 1 apples 1
#2 1 apples 1
#3 2 oranges 1
#4 3 bananas 1
#5 3 bananas 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.