[英]pattern matching in R using grepl
I have a dataframe dat
like this 我有这样的数据帧
dat
P pedigree cas 1 M rs2745406 T 2 M rs6939431 A 3 M SNP_DPB1_33156641 G 4 M SNP_DPB1_33156664_G P 5 M SNP_DPB1_33156664_A A 6 M SNP_DPB1_33156664_T A
I want to exclude all rows where the pedigree
column starts with SNP_
and ends with either G, C, T, or A ( _[GCTA]
). 我想排除
pedigree
以SNP_
并以G,C,T或A( _[GCTA]
)结尾的所有行。 In this case, this would be rows 4,5,6. 在这种情况下,这将是行4,5,6。
How can I achieve this in R? 我怎样才能在R中实现这一目标? I have tried
我努力了
multisnp <- which(grepl("^SNP_*_[GCTA]$", dat$pedigree)=="TRUE")
new_dat <- dat[-multisnp,]
My multisnp
vector is empty, but I can't figure out how to fix it so that it matches the pattern I want. 我的
multisnp
向量是空的,但我无法弄清楚如何修复它以便它匹配我想要的模式。 I think it is my wildcard *
usage that is wrong. 我认为这是我的通配符
*
用法是错误的。
You can use the following with .*?
你可以使用以下
.*?
(match everything in non greedy way): (以非贪婪的方式匹配所有内容):
multisnp <- which(grepl("^SNP_.*?_[GCTA]$", dat$pedigree))
^^^
You can subset dat
like this 您可以像这样对
dat
进行子集化
new_dat <- dat[!grepl("^SNP_.*_[GCTA]$", dat$pedigree), ]
Regarding the code that you've tried, I'm not sure that grepl("^SNP_*_[GCTA]$")
will complete without an error since you aren't passing in an x
vector to grepl
. 关于你尝试过的代码,我不确定
grepl("^SNP_*_[GCTA]$")
是否会在没有错误的情况下完成,因为你没有将x
向量传递给grepl
。 See ?grepl
for more info. 有关详细信息,请参阅
?grepl
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.