[英]comparing and finding overlap range in R
我有兩個表,每個表都包含數字范圍。 一張桌子是另一張桌子的細分。 我想在第一個表中創建二進制列,顯示它們在哪個范圍內重疊。
例如:
df1:
start1 end1
1 6
6 8
9 12
13 15
15 19
19 20
df2:
start2 end2
2 4
9 11
14 18
結果:結果是第一個帶有顯示重疊是否存在的列的表。
start1 end1 overlap
1 6 1
6 8 0
9 12 1
13 15 1
15 19 1
19 20 0
謝謝。
你也可以嘗試foverlaps
從data.table
library(data.table)
setkey(setDT(df1), start1, end1)
setkey(setDT(df2), start2, end2)
df1[,overlap:=foverlaps(df1, df2, which=TRUE)[, !is.na(yid),]+0]
df1
# start1 end1 overlap
#1: 1 6 1
#2: 6 8 0
#3: 9 12 1
#4: 13 15 1
#5: 15 19 1
#6: 19 20 0
有了IRanges
library(IRanges)
ir1 = with(df1, IRanges(start1, end1))
ir2 = with(df2, IRanges(start2, end2))
df1$overlap = countOverlaps(ir1, ir2) != 0
如果這是基因組數據,那么GenomicRanges包是合適的。
這是一種基於生成序列的方法:
nums <- unlist(apply(df2, 1, Reduce, f = seq))
df1$overlap <- as.integer(apply(df1, 1, function(x) any(seq(x[1], x[2]) %in% nums)))
# start1 end1 overlap
# 1 1 6 1
# 2 6 8 0
# 3 9 12 1
# 4 13 15 1
# 5 15 19 1
# 6 19 20 0
您可以使用 ivs package ,它是 package,專門用於區間向量。 iv_overlaps()
返回一個邏輯向量,該向量指定df1
中列的每個間隔是否與df2
中的任何間隔重疊。
library(dplyr)
library(ivs)
df1 <- tribble(
~start1, ~end1,
1, 6,
6, 8,
9, 12,
13, 15,
15, 19,
19, 20
)
df2 <- tribble(
~start2, ~end2,
2, 4,
9, 11,
14, 18
)
df1 <- mutate(df1, range1 = iv(start1, end1), .keep = "unused")
df2 <- mutate(df2, range2 = iv(start2, end2), .keep = "unused")
df1 %>%
mutate(any_overlap = iv_overlaps(range1, df2$range2))
#> # A tibble: 6 × 2
#> range1 any_overlap
#> <iv<dbl>> <lgl>
#> 1 [1, 6) TRUE
#> 2 [6, 8) FALSE
#> 3 [9, 12) TRUE
#> 4 [13, 15) TRUE
#> 5 [15, 19) TRUE
#> 6 [19, 20) FALSE
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.