简体   繁体   中英

Select rows in one table that comes from a range of two columns in another table using R

I need to save the lines from "map" only when the rows are in the interval from the "ref" table:

Follow example to the "map" table:

map<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr1 4 chr1-4 2 2 4
chr2 5 chr2-5 2 2 5
chr2 1 chr2-1 2 2 6
chr2 2 chr2-2 3 2 4
chr2 3 chr2-3 3 2 3
chr2 4 chr2-4 3 2 2
chr2 5 chr2-5 3 2 1
chr2 6 chr2-6 3 2 7
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
map<-read.table(text=map,header=T)

And I have a reference map like this example:

ref<-"chr start end
chr1 1 2 
chr1 2 3 
chr1 5 6 
chr2 7 9" 
ref<-read.table(text=ref,header=T)

And I need a final table like this:

final<-"chr start tag depth BCV State
chr1 1 chr1-1 1 2 1
chr1 2 chr1-2 1 3 2
chr1 3 chr1-3 1 2 3
chr2 7 chr2-7 3 2 9
chr2 8 chr2-8 2 2 2
chr2 9 chr2-9 2 2 1"
final<-read.table(text=final,header=T)

As this was tagged with data.table tag, here's a simple data.table::forverlaps solution

setDT(map)[, end := start]
setkey(setDT(ref))
indx <- unique(foverlaps(map, ref, which = TRUE, nomatch = 0L)$xid)
map[indx]
#     chr start    tag depth BCV State end
# 1: chr1     1 chr1-1     1   2     1   1
# 2: chr1     2 chr1-2     1   3     2   2
# 3: chr1     3 chr1-3     1   2     3   3
# 4: chr2     7 chr2-7     3   2     9   7
# 5: chr2     8 chr2-8     2   2     2   8
# 6: chr2     9 chr2-9     2   2     1   9

This is basically adds an end column to map in order to close the intervals, key the ref data set in order to define the matching intervals for foverlaps while chr is also included. Then just running foverlaps while removing the unmatched values and selecting the unique overlaps in case the intervals in ref are overlapping. Finally just subsetting map according to the index.

First, you need to expand the intervals:

L <- lapply(split(ref,ref$chr), function(d) unique(unlist(mapply(seq,d$start,d$end,SIMPLIFY = F))))

which will give you:

#$chr1
#[1] 1 2 3 5 6

#$chr2
#[1] 7 8 9

And then you can merge:

ref2 <- setNames(stack(L),c('start','chr'))
merge(map,ref2)

Final output:

#   chr start    tag depth BCV State
#1 chr1     1 chr1-1     1   2     1
#2 chr1     2 chr1-2     1   3     2
#3 chr1     3 chr1-3     1   2     3
#4 chr2     7 chr2-7     3   2     9
#5 chr2     8 chr2-8     2   2     2
#6 chr2     9 chr2-9     2   2     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM