[英]Select rows in one table that comes from a range of two columns in another table using R
I need to save the lines from "map" only when the rows are in the interval from the "ref" table: 仅当行位于“ ref”表的间隔内时,才需要保存“ map”中的行:
Follow example to the "map" table: 按照“地图”表的示例操作:
map<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr1 4 chr1-4 2 2 4
chr2 5 chr2-5 2 2 5
chr2 1 chr2-1 2 2 6
chr2 2 chr2-2 3 2 4
chr2 3 chr2-3 3 2 3
chr2 4 chr2-4 3 2 2
chr2 5 chr2-5 3 2 1
chr2 6 chr2-6 3 2 7
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
map<-read.table(text=map,header=T)
And I have a reference map like this example: 我有一个参考地图,例如以下示例:
ref<-"chr start end
chr1 1 2
chr1 2 3
chr1 5 6
chr2 7 9"
ref<-read.table(text=ref,header=T)
And I need a final table like this: 我需要这样的决赛桌:
final<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
final<-read.table(text=final,header=T)
As this was tagged with data.table
tag, here's a simple data.table::forverlaps
solution 由于已使用data.table
标签进行了标记,因此这是一个简单的data.table::forverlaps
解决方案
setDT(map)[, end := start]
setkey(setDT(ref))
indx <- unique(foverlaps(map, ref, which = TRUE, nomatch = 0L)$xid)
map[indx]
# chr start tag depth BCV State end
# 1: chr1 1 chr1-1 1 2 1 1
# 2: chr1 2 chr1-2 1 3 2 2
# 3: chr1 3 chr1-3 1 2 3 3
# 4: chr2 7 chr2-7 3 2 9 7
# 5: chr2 8 chr2-8 2 2 2 8
# 6: chr2 9 chr2-9 2 2 1 9
This is basically adds an end
column to map
in order to close the intervals, key
the ref
data set in order to define the matching intervals for foverlaps
while chr
is also included. 基本上,这是添加一个end
列以map
以关闭间隔, key
ref
数据集以定义foverlaps
的匹配间隔,同时还包括chr
。 Then just running foverlaps
while removing the unmatched values and selecting the unique
overlaps in case the intervals in ref
are overlapping. 然后仅运行foverlaps
同时删除不匹配的值并选择unique
重叠,以防ref
中的间隔重叠。 Finally just subsetting map
according to the index. 最后只需根据索引对map
进行设置。
First, you need to expand the intervals: 首先,您需要扩展时间间隔:
L <- lapply(split(ref,ref$chr), function(d) unique(unlist(mapply(seq,d$start,d$end,SIMPLIFY = F))))
which will give you: 这将为您提供:
#$chr1
#[1] 1 2 3 5 6
#$chr2
#[1] 7 8 9
And then you can merge: 然后您可以合并:
ref2 <- setNames(stack(L),c('start','chr'))
merge(map,ref2)
Final output: 最终输出:
# chr start tag depth BCV State
#1 chr1 1 chr1-1 1 2 1
#2 chr1 2 chr1-2 1 3 2
#3 chr1 3 chr1-3 1 2 3
#4 chr2 7 chr2-7 3 2 9
#5 chr2 8 chr2-8 2 2 2
#6 chr2 9 chr2-9 2 2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.