使用R选择一个表中来自另一表中两列范围的行

Question

I need to save the lines from "map" only when the rows are in the interval from the "ref" table: 仅当行位于“ ref”表的间隔内时，才需要保存“ map”中的行：

Follow example to the "map" table: 按照“地图”表的示例操作：

map<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr1 4 chr1-4 2 2 4
chr2 5 chr2-5 2 2 5
chr2 1 chr2-1 2 2 6
chr2 2 chr2-2 3 2 4
chr2 3 chr2-3 3 2 3
chr2 4 chr2-4 3 2 2
chr2 5 chr2-5 3 2 1
chr2 6 chr2-6 3 2 7
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
map<-read.table(text=map,header=T)

And I have a reference map like this example: 我有一个参考地图，例如以下示例：

ref<-"chr start end
chr1 1 2 
chr1 2 3 
chr1 5 6 
chr2 7 9" 
ref<-read.table(text=ref,header=T)

And I need a final table like this: 我需要这样的决赛桌：

final<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
final<-read.table(text=final,header=T)

Answer 1

As this was tagged with data.table tag, here's a simple data.table::forverlaps solution 由于已使用data.table标签进行了标记，因此这是一个简单的data.table::forverlaps解决方案

setDT(map)[, end := start]
setkey(setDT(ref))
indx <- unique(foverlaps(map, ref, which = TRUE, nomatch = 0L)$xid)
map[indx]
#     chr start    tag depth BCV State end
# 1: chr1     1 chr1-1     1   2     1   1
# 2: chr1     2 chr1-2     1   3     2   2
# 3: chr1     3 chr1-3     1   2     3   3
# 4: chr2     7 chr2-7     3   2     9   7
# 5: chr2     8 chr2-8     2   2     2   8
# 6: chr2     9 chr2-9     2   2     1   9

This is basically adds an end column to map in order to close the intervals, key the ref data set in order to define the matching intervals for foverlaps while chr is also included. 基本上，这是添加一个end列以map以关闭间隔， key ref数据集以定义foverlaps的匹配间隔，同时还包括chr 。 Then just running foverlaps while removing the unmatched values and selecting the unique overlaps in case the intervals in ref are overlapping. 然后仅运行foverlaps同时删除不匹配的值并选择unique重叠，以防ref中的间隔重叠。 Finally just subsetting map according to the index. 最后只需根据索引对map进行设置。

Answer 2

First, you need to expand the intervals: 首先，您需要扩展时间间隔：

L <- lapply(split(ref,ref$chr), function(d) unique(unlist(mapply(seq,d$start,d$end,SIMPLIFY = F))))

which will give you: 这将为您提供：

#$chr1
#[1] 1 2 3 5 6

#$chr2
#[1] 7 8 9

And then you can merge: 然后您可以合并：

ref2 <- setNames(stack(L),c('start','chr'))
merge(map,ref2)

Final output: 最终输出：

#   chr start    tag depth BCV State
#1 chr1     1 chr1-1     1   2     1
#2 chr1     2 chr1-2     1   3     2
#3 chr1     3 chr1-3     1   2     3
#4 chr2     7 chr2-7     3   2     9
#5 chr2     8 chr2-8     2   2     2
#6 chr2     9 chr2-9     2   2     1

使用R选择一个表中来自另一表中两列范围的行

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-01-13 16:26:57

解决方案2
2 2016-01-13 16:19:49

使用R选择一个表中来自另一表中两列范围的行

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-01-13 16:26:57

解决方案2 2 2016-01-13 16:19:49

解决方案1
4 已采纳 2016-01-13 16:26:57

解决方案2
2 2016-01-13 16:19:49