[英]Partial string match over two columns R
我有一个很大的 df(例如这里只有 2 列)
CancerVar<-c("CancerVar:9#Tier_II_potential","CancerVar:2#Tier_IV_benign","CancerVar:11#Tier_I_strong","CancerVar:2#Tier_IV_benign","CancerVar:2#Tier_IV_benign")
driver_mut_prediction<-c("not protein-affecting","TIER 1","passenger","TIER 2","passenger")
df<-data.frame(CancerVar,driver_mut_prediction)
df
CancerVar driver_mut_prediction
1 CancerVar:9#Tier_II_potential not protein-affecting
2 CancerVar:2#Tier_IV_benign TIER 1
3 CancerVar:11#Tier_I_strong passenger
4 CancerVar:2#Tier_IV_benign TIER 2
5 CancerVar:2#Tier_IV_benign passenger
我想 select 行在两列上使用部分(不同的)字符串匹配。 我想要 select 行,其中 EITHER(CancerVar 包含 Tier I 或 Tier II)或(driver_mut_prediction 包含 TIER 1 或 TIER 2)
我努力了:
df_sub<-df[with(df, grepl("TIER|Tier_I|Tier_II", paste(driver_mut_prediction, CancerVar,ignore.case=FALSE))),]
仍然有最后一行(所以两个条件都不起作用)
我努力了:
df %>% select(contains("Tier_I|Tier_II|TIER 1|TIER 2"))
具有 0 列和 5000 行的数据框
请帮忙!
这种方法应该有效:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
CancerVar<-c("CancerVar:9#Tier_II_potential","CancerVar:2#Tier_IV_benign","CancerVar:11#Tier_I_strong","CancerVar:2#Tier_IV_benign","CancerVar:2#Tier_IV_benign")
driver_mut_prediction<-c("not protein-affecting","TIER 1","passenger","TIER 2","passenger")
df<-data.frame(CancerVar,driver_mut_prediction)
df %>%
filter(
grepl("Tier_I_|Tier_II_", CancerVar) |
grepl("TIER 1|TIER 2", driver_mut_prediction)
)
#> CancerVar driver_mut_prediction
#> 1 CancerVar:9#Tier_II_potential not protein-affecting
#> 2 CancerVar:2#Tier_IV_benign TIER 1
#> 3 CancerVar:11#Tier_I_strong passenger
#> 4 CancerVar:2#Tier_IV_benign TIER 2
由reprex package (v2.0.1) 创建于 2022-04-06
或者,使用基数 R:
CancerVar<-c("CancerVar:9#Tier_II_potential","CancerVar:2#Tier_IV_benign","CancerVar:11#Tier_I_strong","CancerVar:2#Tier_IV_benign","CancerVar:2#Tier_IV_benign")
driver_mut_prediction<-c("not protein-affecting","TIER 1","passenger","TIER 2","passenger")
df<-data.frame(CancerVar,driver_mut_prediction)
df[grepl("Tier_I_|Tier_II_", df$CancerVar) | grepl("TIER 1|TIER 2", df$driver_mut_prediction),]
#> CancerVar driver_mut_prediction
#> 1 CancerVar:9#Tier_II_potential not protein-affecting
#> 2 CancerVar:2#Tier_IV_benign TIER 1
#> 3 CancerVar:11#Tier_I_strong passenger
#> 4 CancerVar:2#Tier_IV_benign TIER 2
由reprex package (v2.0.1) 创建于 2022-04-06
您可以使用str_detect
:
library(tidyverse)
df %>%
filter(str_detect(CancerVar, "Tier_I_|Tier_II_") |
str_detect(driver_mut_prediction, "TIER 1|TIER 2"))
Output
CancerVar driver_mut_prediction
1 CancerVar:9#Tier_II_potential not protein-affecting
2 CancerVar:2#Tier_IV_benign TIER 1
3 CancerVar:11#Tier_I_strong passenger
4 CancerVar:2#Tier_IV_benign TIER 2
数据
df <- structure(list(CancerVar = c("CancerVar:9#Tier_II_potential",
"CancerVar:2#Tier_IV_benign", "CancerVar:11#Tier_I_strong", "CancerVar:2#Tier_IV_benign",
"CancerVar:2#Tier_IV_benign"), driver_mut_prediction = c("not protein-affecting",
"TIER 1", "passenger", "TIER 2", "passenger")), class = "data.frame", row.names = c(NA,
-5L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.