[英]Extracting rows from a data frame depending on the combination of values in two colums
I got a data frame我有一个数据框
a <- c('A','A','A','A','B','B','C','C')
b <- c(1,2,1,3,1,3,1,6)
c <- c('K','K','H','H','K','K','H','H')
frame <- data.frame(a,b,c)
> frame
a b c
1 A 1 K
2 A 2 K
3 A 1 H
4 A 3 H
5 B 1 K
6 B 3 K
7 C 1 H
8 C 6 H
And now I want to extract data the following way: If the string in 'a' occurs in a row with 'K' AND in a row with 'H', the rows with a 'K' will be left out.现在我想通过以下方式提取数据:如果“a”中的字符串出现在带有“K”的行中并且出现在带有“H”的行中,则带有“K”的行将被忽略。 In the end it should look like this:最后它应该是这样的:
> frame
a b c
1 A 1 H
2 A 3 H
3 B 1 K
4 B 3 K
5 C 1 H
6 C 6 H
Maybe you got any ideas.也许你有什么想法。 Thank you!谢谢!
You can use intersect
to find strings in a
having H
and K
in column c
and then extract those where column c
holds a K
.您可以使用intersect
在列c
中查找a
H
和K
的字符串,然后提取列c
包含K
的字符串。
frame[!(frame$a %in% intersect(frame$a[frame$c=="K"],
frame$a[frame$c=="H"]) & frame$c=="K"),]
# a b c
#3 A 1 H
#4 A 3 H
#5 B 1 K
#6 B 3 K
#7 C 1 H
#8 C 6 H
We could use a group by filter
我们可以通过filter
使用一个组
library(dplyr)
frame %>%
group_by(a) %>%
filter(all(c('K', 'H') %in% c) & c != 'K'|n_distinct(c) == 1)
# A tibble: 6 x 3
# Groups: a [3]
# a b c
# <fct> <dbl> <fct>
#1 A 1 H
#2 A 3 H
#3 B 1 K
#4 B 3 K
#5 C 1 H
#6 C 6 H
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.