[英]R: subset dataframe based on column entry in multiple rows
I have a dataframe with information on several genes in a format similar to: 我有一个数据框,其中包含有关几种基因的信息,格式类似于:
chr start end Gene Region
1 100 110 Bat Exon
1 120 130 Bat Intron
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
1 900 980 Mit Promoter, Upstream
I would like to subset the data to remove any rows that contains genes that have "Exon" or "Promoter" in the Regions column. 我想对数据进行子集删除,以删除任何包含“区域”列中具有“外显子”或“启动子”的基因的行。 I had been using:
我一直在使用:
Regions <- subset(Table, Region == "Intron" | Region== "DownStream" | Region =="Upstream" | Region=="DownStream,Upstream")
However this gives me: 但这给了我:
chr start end Gene Region
1 120 130 Bat Intron
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
What I want is: 我想要的是:
chr start end Gene Region
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
Try this using grepl
: 使用
grepl
尝试grepl
:
df[!grepl("Exon|Promoter", df$Region),]
# chr start end Gene Region
#2 1 120 130 Bat Intron
#3 1 500 550 Ball Upstream, Downstream
#4 1 590 600 Ball Intron, Upstream
It's not clear to me why you want the row 2 with "Intron" removed as well. 我不清楚,为什么还要删除“ Intron”的第二行。 Please explain that.
请解释一下。
Think I understood now, try this instead: 以为我现在明白了,试试看:
temp <- df$Gene[grepl("Exon|Promoter", df$Region)]
df[!df$Gene %in% temp,]
# chr start end Gene Region
#3 1 500 550 Ball Upstream, Downstream
#4 1 590 600 Ball Intron, Upstream
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.