[英]keep only `Groups` where at least 2 element in a column are present within a list in R
[英]Groupby and keep only groups that do contain element in a list
我有一个df,例如:
Groups COL1
G1 SP1-3
G1 SP2s
G1 SP4_09
G1 SP7z
G3 SP1_OK
G3 SP1-9
G4 SP1_3
G4 SP2_3
G5 SP3_5
我只能对确实包含list=c('SP1','SP2')
中 COL1 中的所有字符串的组进行子集化
在这里我应该得到:
Groups COL1
G1 S1-3
G1 SP2s
G1 SP4_09
G1 SP7z
G4 SP1_3
G4 SP2_3
我只保留G1
和G4
,因为它们的字符串包含SP1
和SP2
。 另一个不包含两者
数据
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L,
3L, 4L), .Label = c("G1", "G3", "G4", "G5"), class = "factor"),
COL1 = structure(c(3L, 6L, 8L, 9L, 2L, 4L, 1L, 5L, 7L), .Label = c("SP1_3",
"SP1_OK", "SP1-3", "SP1-9", "SP2_3", "SP2s", "SP3_5", "SP4_09",
"SP7z"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))
下面的方法应该有效。
library(dplyr)
library(stringr)
data <- structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L),
.Label = c("G1", "G3", "G4", "G5"),
class = "factor"),
COL1 = structure(c(3L, 6L, 8L, 9L, 2L, 4L, 1L, 5L, 7L),
.Label = c("SP1_3", "SP1_OK", "SP1-3",
"SP1-9", "SP2_3", "SP2s", "SP3_5",
"SP4_09", "SP7z"),
class = "factor")),
class = "data.frame",
row.names = c(NA, -9L))
data %>%
group_by(Groups) %>%
filter(as.logical(any(str_detect(COL1, "SP1")) &
any(str_detect(COL1, "SP2"))))
#> # A tibble: 6 x 2
#> # Groups: Groups [2]
#> Groups COL1
#> <fct> <fct>
#> 1 G1 SP1-3
#> 2 G1 SP2s
#> 3 G1 SP4_09
#> 4 G1 SP7z
#> 5 G4 SP1_3
#> 6 G4 SP2_3
由代表 package (v0.3.0) 于 2020 年 6 月 10 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.