简体   繁体   English

将 dplyr 与 group_by 一起使用并使用分类变量进行过滤

[英]Using dplyr with group_by and filter with categorical variable

Very new to R. I have a large text-based df that I would like to perform some checks in. I want to check which variables in one vector ('colour') have two distinct variables ('a' and 'b') in another vector. R 非常新。我有一个大型的基于文本的 df,我想执行一些检查。我想检查一个向量(“颜色”)中的哪些变量有两个不同的变量(“a”和“b”)在另一个向量中。 This should be an AND not an OR type query.这应该是 AND 而不是 OR 类型的查询。 The df looks like this df 看起来像这样

Data数据

structure(list(colour = c("blue", "blue", "red", "red", "red", 
"purple", "purple"), letter = c("a", "c", "a", "m", "b", "a", 
"b")), class = "data.frame", row.names = c(NA, -7L))

colour letter
blue   a
blue   c
red    a
red    m
red    b
purple a
purple b

I think the best way to do this is by subsetting, such that I get a new df ('df2') with the relevant data, which should look like this:我认为最好的方法是通过子集化,这样我就可以得到一个包含相关数据的新 df ('df2'),它应该如下所示:

colour letter
red    a
red    b
purple a
purple b

I tried the following dplyr commands, but I don't get the right results ('blue a' is still there).我尝试了以下 dplyr 命令,但没有得到正确的结果('blue a' 仍然存在)。

df2<-df%>%group_by(colour)%>%filter(letter %in% c('a','b'))

I'd appreciate any help I can get!我将不胜感激我能得到的任何帮助!

letter %in% c('a', 'b') checks each letter to see whether it's in the set { a , b } (that is, it will return true for each letter that is a or b ), and keeps them. letter %in% c('a', 'b')检查每个字母以查看它是否在集合 { a , b } 中(也就是说,它将为ab的每个字母返回 true),并保留它们. What you want to do is check that there is both an a in the group and a b in the group:您要做的是检查组中是否同时存在a和组b

df %>% 
  group_by(colour) %>% 
  filter('a' %in% letter & 'b' %in% letter)

## or, if you have more than a couple letters (maybe a vector of letters)
df %>% 
  group_by(colour) %>% 
  filter(all(c('a', 'b') %in% letter))

It's not clear from your text or example what should happen if a group contains a , b , and another letter, say c .如果一个组包含ab和另一个字母,例如c ,那么从您的文本或示例中不清楚会发生什么。 The code above will keep the whole group as long as there is an a and a b in it.只要其中有ab ,上面的代码就会保留整个组。

If you want to keep only the a and b letters of the group (in the case there are more), keep the filter condition you had as well:如果只想保留组的ab字母(如果有更多字母),请同时保留您的过滤条件:

... filter(all(c('a', 'b') %in% letter), letter %in% c('a', 'b'))

If you want to keep only groups that have a and b and no other letters, then I would do this:如果你只想保留有ab而没有其他字母的组,那么我会这样做:

... filter(all(c('a', 'b') %in% letter) & n_distinct(letter) == 2)
## another alternative
... filter(all(c('a', 'b') %in% letter) & all(letter %in% c('a', 'b')))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM