简体   繁体   中英

R, if number in one table belongs to range in another

I got 2 tables. The 1st one looks like:

V1 V2
128 1.0000  
139 0.9375
141 1.0000

The 2nd one looks like:

V1 V2 V3
gene    90  100
mRNA    120 129
CDS 130 139
CDS 140 150

V2 and V3 in 2nd table are borders of range (eg 90:100, 120:129 etc) So i need to compare number from V1 of the 1nd table if it belongs to any range. If it does, i need to rbind these rows together. So it would look like

V1.1 V2.1 V1.2 V2.2 V3.2
128 1.0000 mRNA 120 129
139 0.9375 CDS 130 139
141 1.0000 CDS 140 150

Or smth.

The problem is these tables are really big (~5G each).

Thank you in advance.

Considering the size of your data sets, I would suggest foverlaps in the data.table package:

library(data.table)
##
setDT(d1)
setDT(d2)
##
setnames(d2,c("V1.y","V2.y","V3.y"))
setkeyv(d2,c("V2.y","V3.y"))
##
setnames(d1,c("V1.x","V2.x"))
d1[,V11:=V1.x]
##
Merged <- foverlaps(
  x=d1,y=d2,
  by.x=c("V1.x","V11"),
  type="within")
Merged[,V11:=NULL]
##
R> Merged
   V1.y V2.y V3.y V1.x   V2.x
1: mRNA  120  129  128 1.0000
2:  CDS  130  139  139 0.9375
3:  CDS  140  150  141 1.0000

where I appended .x and .y just for clarity. foverlaps is primarily intended for joining over two ranges, one in each of the tables used, so it requires that by.x specifies two (different) columns in the x object. The only way I know of to get around this in situations like this, where we want to one column of x to be in the range of two columns of y , is to create a temporary duplicate column. This is the purpose of d1[,V11:=V1.x] ; which is removed afterwards.

Data:

d1 <- read.table(
  text="V1 V2
128 1.0000  
139 0.9375
141 1.0000",
  header=TRUE)
d2 <- read.table(
  text="V1 V2 V3
gene    90  100
mRNA    120 129
  CDS 130 139
  CDS 140 150",
  header=TRUE)

Personally I wouldn't do this in R, although it's probably possible with one of the bioconductor packages like GRanges. Rather I would convert the files to BED format , sort them (with sort -k1,1 -k2,2n) and use bedtools intersect , something like

intersectBed -a tbl1.bed -b tbl2.bed -wa -wb -sorted > merged.bed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM