R：基于多行中的列条目的子数据框

Question

I have a dataframe with information on several genes in a format similar to: 我有一个数据框，其中包含有关几种基因的信息，格式类似于：

chr    start    end    Gene    Region
1    100    110    Bat     Exon
1    120    130    Bat     Intron
1    500    550    Ball    Upstream, Downstream
1    590    600    Ball    Intron, Upstream
1    900    980    Mit     Promoter, Upstream

I would like to subset the data to remove any rows that contains genes that have "Exon" or "Promoter" in the Regions column. 我想对数据进行子集删除，以删除任何包含“区域”列中具有“外显子”或“启动子”的基因的行。 I had been using: 我一直在使用：

Regions <- subset(Table, Region == "Intron" | Region== "DownStream" | Region =="Upstream" | Region=="DownStream,Upstream")

However this gives me: 但这给了我：

chr    start    end    Gene    Region
1    120    130    Bat     Intron
1    500    550    Ball    Upstream, Downstream
1    590    600    Ball    Intron, Upstream

What I want is: 我想要的是：

chr    start    end    Gene    Region
1    500    550    Ball    Upstream, Downstream
1    590    600    Ball    Intron, Upstream

Answer 1

Try this using grepl : 使用grepl尝试grepl ：

df[!grepl("Exon|Promoter", df$Region),]
#  chr start end Gene               Region
#2   1   120 130  Bat               Intron
#3   1   500 550 Ball Upstream, Downstream
#4   1   590 600 Ball     Intron, Upstream

It's not clear to me why you want the row 2 with "Intron" removed as well. 我不清楚，为什么还要删除“ Intron”的第二行。 Please explain that. 请解释一下。

Edit: 编辑：

Think I understood now, try this instead: 以为我现在明白了，试试看：

temp <- df$Gene[grepl("Exon|Promoter", df$Region)]
df[!df$Gene %in% temp,]
#  chr start end Gene               Region
#3   1   500 550 Ball Upstream, Downstream
#4   1   590 600 Ball     Intron, Upstream

R：基于多行中的列条目的子数据框

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-11-04 15:16:10

Edit: 编辑：

R：基于多行中的列条目的子数据框

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-11-04 15:16:10

Edit: 编辑：

解决方案1
2 已采纳 2014-11-04 15:16:10