I got 2 tables. The 1st one looks like:
V1 V2
128 1.0000
139 0.9375
141 1.0000
The 2nd one looks like:
V1 V2 V3
gene 90 100
mRNA 120 129
CDS 130 139
CDS 140 150
V2 and V3 in 2nd table are borders of range (eg 90:100, 120:129 etc) So i need to compare number from V1 of the 1nd table if it belongs to any range. If it does, i need to rbind these rows together. So it would look like
V1.1 V2.1 V1.2 V2.2 V3.2
128 1.0000 mRNA 120 129
139 0.9375 CDS 130 139
141 1.0000 CDS 140 150
Or smth.
The problem is these tables are really big (~5G each).
Thank you in advance.
Considering the size of your data sets, I would suggest foverlaps
in the data.table
package:
library(data.table)
##
setDT(d1)
setDT(d2)
##
setnames(d2,c("V1.y","V2.y","V3.y"))
setkeyv(d2,c("V2.y","V3.y"))
##
setnames(d1,c("V1.x","V2.x"))
d1[,V11:=V1.x]
##
Merged <- foverlaps(
x=d1,y=d2,
by.x=c("V1.x","V11"),
type="within")
Merged[,V11:=NULL]
##
R> Merged
V1.y V2.y V3.y V1.x V2.x
1: mRNA 120 129 128 1.0000
2: CDS 130 139 139 0.9375
3: CDS 140 150 141 1.0000
where I appended .x
and .y
just for clarity. foverlaps
is primarily intended for joining over two ranges, one in each of the tables used, so it requires that by.x
specifies two (different) columns in the x
object. The only way I know of to get around this in situations like this, where we want to one column of x
to be in the range of two columns of y
, is to create a temporary duplicate column. This is the purpose of d1[,V11:=V1.x]
; which is removed afterwards.
Data:
d1 <- read.table(
text="V1 V2
128 1.0000
139 0.9375
141 1.0000",
header=TRUE)
d2 <- read.table(
text="V1 V2 V3
gene 90 100
mRNA 120 129
CDS 130 139
CDS 140 150",
header=TRUE)
Personally I wouldn't do this in R, although it's probably possible with one of the bioconductor packages like GRanges. Rather I would convert the files to BED format , sort them (with sort -k1,1 -k2,2n) and use bedtools intersect , something like
intersectBed -a tbl1.bed -b tbl2.bed -wa -wb -sorted > merged.bed
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.