简体   繁体   English

使用R选择一个表中来自另一表中两列范围的行

[英]Select rows in one table that comes from a range of two columns in another table using R

I need to save the lines from "map" only when the rows are in the interval from the "ref" table: 仅当行位于“ ref”表的间隔内时,才需要保存“ map”中的行:

Follow example to the "map" table: 按照“地图”表的示例操作:

map<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr1 4 chr1-4 2 2 4
chr2 5 chr2-5 2 2 5
chr2 1 chr2-1 2 2 6
chr2 2 chr2-2 3 2 4
chr2 3 chr2-3 3 2 3
chr2 4 chr2-4 3 2 2
chr2 5 chr2-5 3 2 1
chr2 6 chr2-6 3 2 7
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
map<-read.table(text=map,header=T)

And I have a reference map like this example: 我有一个参考地图,例如以下示例:

ref<-"chr start end
chr1 1 2 
chr1 2 3 
chr1 5 6 
chr2 7 9" 
ref<-read.table(text=ref,header=T)

And I need a final table like this: 我需要这样的决赛桌:

final<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
final<-read.table(text=final,header=T)

As this was tagged with data.table tag, here's a simple data.table::forverlaps solution 由于已使用data.table标签进行了标记,因此这是一个简单的data.table::forverlaps解决方案

setDT(map)[, end := start]
setkey(setDT(ref))
indx <- unique(foverlaps(map, ref, which = TRUE, nomatch = 0L)$xid)
map[indx]
#     chr start    tag depth BCV State end
# 1: chr1     1 chr1-1     1   2     1   1
# 2: chr1     2 chr1-2     1   3     2   2
# 3: chr1     3 chr1-3     1   2     3   3
# 4: chr2     7 chr2-7     3   2     9   7
# 5: chr2     8 chr2-8     2   2     2   8
# 6: chr2     9 chr2-9     2   2     1   9

This is basically adds an end column to map in order to close the intervals, key the ref data set in order to define the matching intervals for foverlaps while chr is also included. 基本上,这是添加一个end列以map以关闭间隔, key ref数据集以定义foverlaps的匹配间隔,同时还包括chr Then just running foverlaps while removing the unmatched values and selecting the unique overlaps in case the intervals in ref are overlapping. 然后仅运行foverlaps同时删除不匹配的值并选择unique重叠,以防ref中的间隔重叠。 Finally just subsetting map according to the index. 最后只需根据索引对map进行设置。

First, you need to expand the intervals: 首先,您需要扩展时间间隔:

L <- lapply(split(ref,ref$chr), function(d) unique(unlist(mapply(seq,d$start,d$end,SIMPLIFY = F))))

which will give you: 这将为您提供:

#$chr1
#[1] 1 2 3 5 6

#$chr2
#[1] 7 8 9

And then you can merge: 然后您可以合并:

ref2 <- setNames(stack(L),c('start','chr'))
merge(map,ref2)

Final output: 最终输出:

#   chr start    tag depth BCV State
#1 chr1     1 chr1-1     1   2     1
#2 chr1     2 chr1-2     1   3     2
#3 chr1     3 chr1-3     1   2     3
#4 chr2     7 chr2-7     3   2     9
#5 chr2     8 chr2-8     2   2     2
#6 chr2     9 chr2-9     2   2     1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从一个表中选择数据从另一个表中选择数据列,使用r - Data from one table to select data columns from another table, using r Select 行从一个表到另一个然后 plot 来自第二个表的数据 R Shiny - Select rows from one table to another then plot the data from the second table R Shiny 根据 R 中 data.table 中两列的条件过滤范围行的问题 - Issue in filtering rows for a range based on conditions from two columns in a data table in R 如何根据 R 中一个表中的两列之间的依赖关系和另一个表的结果过滤结果? - How to filter results based on dependencies between two columns in one table and results from another table in R? R:在data.table中选择列范围 - R: select range of columns in data.table 使用两列在R data.table中具有唯一行进行查找 - lookup using two columns with unique rows in R data.table 检查一个表(X)中的值是否在具有R data.table的另一个表(Y)中的两列中的值之间 - Check if a value in one table (X) is between the values in two columns in another table (Y) with R data.table R,如果一个表中的数字属于另一个表中的范围 - R, if number in one table belongs to range in another 在 R 中,如何从另一个表创建具有唯一行的表,然后将新列添加到新表 - In R, How to create a table with unique rows from another table and then add new columns to new table 将一个R数据表中的列添加到另一个 - Add columns from one R data table to another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM