將向量中的起始位置映射到另一個向量中的停止位置

Question

我已經導出了 DNA 字符串中的所有開始和停止位置，現在我想將每個開始位置與每個停止位置映射，這兩個位置都是向量，然后使用這些位置從 DNA 字符串序列中提取相應的子字符串。 但是我無法有效地循環遍歷兩個向量來實現這一點，尤其是因為它們的長度不同。

我嘗試了不同版本的循環（for、ifelse），但我還不能完全理解解決方案。

這是我解決此問題的多次嘗試之一的示例。

new = data.frame()
for (i in start_pos){
  for (j in stop_pos){
    while (j>i){
      new[j,1]=i
      new[j,2]=j
    }
     }
}

這是我想要的結果的一個例子：start = c(1,5,7, 9, 15) stop = c(4, 13, 20, 30, 40, 50)。 我想要的結果理想情況下是一個兩列的數據框，將每個開始位置映射到其停止位置。 我只想在 df 上添加行，其中 by 起始值大於其相應的停止值（只要滿足此條件，多個起始值可以具有相同的停止值），如下面的示例所示。

 i.e first row df= (1,4)
    second row df= (5,13)
    third row df = (7, 13 )
    fourth row df = (9,13)
    fifth row df =  (15, 20)

Answer 1

這是一個可能的tidyverse解決方案：

library(purrr)
library(plyr)
library(dplyr)

map2用於映射兩個向量（開始和停止）的值。 然后我們從這些中創建一個向量，然后unlist並將我們的結果組合到一個data.frame對象中。

編輯：使用更新的條件，我們可以執行以下操作：

start1= c(118,220, 255) 
stop1 =c(115,210,260)
res<-purrr::map2(start1[1:length(stop1)],stop1,function(x,y) c(x,y[y>x]))
res[unlist(lapply(res,function(x) length(x)>1))]
   # [[1]]
   # [1] 255 260

原文：

plyr::ldply(purrr::map2(start[1:length(stop)],stop,function(x,y) c(x,y)),unlist) %>% 
   setNames(nm=c("start","stop")) %>% 
 mutate(newCol=paste0("(",start,",",stop,")"))
#  start stop  newCol
#1     1    4   (1,4)
#2     5   13  (5,13)
#3    15   20 (15,20)
#4    NA   30 (NA,30)
#5    NA   40 (NA,40)
#6    NA   50 (NA,50)

替代方案：@Marius 展示了一個聰明的方法。關鍵是要有相應的長度。

plyr::ldply(purrr::map2(start,stop[1:length(start)],function(x,y) c(x,y)),unlist) %>% 
   setNames(nm=c("start","stop")) %>% 
 mutate(newCol=paste0("(",start,",",stop,")"))
  start stop  newCol
1     1    4   (1,4)
2     5   13  (5,13)
3    15   20 (15,20)

Answer 2

這是一個相當簡單的解決方案 - 除非您確定需要額外的復雜性，否則不要將事情過度復雜化可能是好的。 開始和停止似乎已經匹配了，您可能只有一個比另一個多，因此您可以找到最短向量的長度，並且只使用start和stop中的許多項：

start = c(1, 5, 15) 
stop = c(4, 13, 20, 30, 40, 50)

min_length = min(length(start), length(stop))

df = data.frame(
    start = start[1:min_length],
    stop = stop[1:min_length]
)

編輯：閱讀你的一些評論在這里后，它看起來像你的問題實際上是比它更復雜第一似乎（未來與演示需要的復雜程度例子，而不過於復雜，始終是棘手的）。 如果您想將每個起點與大於起點的下一站匹配，您可以執行以下操作：

# Slightly modified example: multiple starts
#   that can be matched with one stop
start = c(1, 5, 8)
stop = c(4, 13, 20, 30, 40, 50)

df2 = data.frame(
    start = start,
    stop = sapply(start, function(s) { min(stop[stop > s]) })
)

將向量中的起始位置映射到另一個向量中的停止位置

問題描述

2 個解決方案

解決方案1
1 2019-04-02 04:51:02

解決方案2
1 2019-04-02 05:03:03

將向量中的起始位置映射到另一個向量中的停止位置

問題描述

2 個解決方案

解決方案1 1 2019-04-02 04:51:02

解決方案2 1 2019-04-02 05:03:03

解決方案1
1 2019-04-02 04:51:02

解決方案2
1 2019-04-02 05:03:03