pythonic相當於R GRanges中的reduce() - 如何折疊范圍數據？

Question

在 R 中（雖然冗長）：

這是一個測試 data.frame

df <- data.frame(
  "CHR" = c(1,1,1,2,2),
  "START" = c(100, 200, 300, 100, 400),
  "STOP" = c(150,350,400,500,450)
  )

首先我制作 GRanges 對象：

gr <- GenomicRanges::GRanges(
  seqnames = df$CHR,
  ranges = IRanges(start = df$START, end = df$STOP)
  )

然后我減少了折疊成新農庄對象的間隔：

reduced <- reduce(gr)

現在將一個新列附加到原始數據幀，以確認哪些行屬於同一個連續的“塊”。

subjectHits(findOverlaps(gr, reduced))

輸出：

> df
  CHR START STOP locus
1   1   100  150     1
2   1   200  350     2
3   1   300  400     2
4   2   100  500     3
5   2   400  450     3

我如何在 Python 中做到這一點？ 我知道 pybedtools，但據我所知，這需要我將 data.frame 保存到磁盤。 任何幫助表示贊賞。

Answer 1

https://github.com/biocore-ntnu/pyranges

import pyranges as pr
chromosomes = [1] * 3 + [2] * 2
starts = [100, 200, 300, 100, 400]
ends = [150, 350, 400, 500, 450]
gr = pr.PyRanges(chromosomes=chromosomes, starts=starts, ends=ends)
gr.cluster()
# +--------------+-----------+-----------+-----------+
# |   Chromosome |     Start |       End |   Cluster |
# |       (int8) |   (int32) |   (int32) |   (int64) |
# |--------------+-----------+-----------+-----------|
# |            1 |       100 |       150 |         1 |
# |            1 |       200 |       350 |         2 |
# |            1 |       300 |       400 |         2 |
# |            2 |       100 |       500 |         3 |
# |            2 |       400 |       450 |         3 |
# +--------------+-----------+-----------+-----------+

它將在 0.0.21 中推出。 謝謝你的主意！

Answer 2

看來您正試圖獲得這些的交集。 Pybedtools 將接受流作為輸入。 將您的數據讀入一個采用床格式的字符串。

“chr，開始，停止”

我從一個 python 字典開始並循環遍歷它。

bed_string += "{0} {1} {2} {3} {0}|{1}|{2}|{3}\n".format(chrom, coord_start, coord_stop, aberration)
# Now create your bedtools.
breakpoint_bedtool = pybedtools.BedTool(bed_string, from_string=True)
target_bedtool = pybedtools.BedTool(self.args.Target_Bed_File, from_string=False)
# Find target intersects for printing.
breakpoint_target_intersect = breakpoint_bedtool.intersect(target_bedtool, wb=True, stream=True)

pythonic相當於R GRanges中的reduce() - 如何折疊范圍數據？

問題描述

2 個解決方案

解決方案1
2 2019-04-14 14:14:52

解決方案2
1 2018-03-16 16:32:56

pythonic相當於R GRanges中的reduce() - 如何折疊范圍數據？

問題描述

2 個解決方案

解決方案1 2 2019-04-14 14:14:52

解決方案2 1 2018-03-16 16:32:56

解決方案1
2 2019-04-14 14:14:52

解決方案2
1 2018-03-16 16:32:56