Python bitarray反向补码

Question

I am using Python's bitarray module to convert a DNA sequence, that is written in a binary file, to its reverse complement. 我正在使用Python的bitarray module将一个用二进制文件写的DNA序列转换成反向补码。 Each nucleotide is represented by two bits in the following format: 每个核苷酸由以下格式的两位表示：

A - 00, C - 01, G - 10, T - 11 . A - 00, C - 01, G - 10, T - 11 。

For example, the reverse complement of 例如，反向补充
AGCTACGG (00 10 01 11 00 01 10 10) would be CCGTAGCT (01 01 10 11 00 10 01 11) . AGCTACGG (00 10 01 11 00 01 10 10)将是CCGTAGCT (01 01 10 11 00 10 01 11) 。

This sequence takes up exactly 16 bits (2 bytes) , but a sequence of length 9 would take 18 bits and it is padded to take up 24 bits ( 3 bytes). 该序列恰好占用16位（2字节） ，但是长度为9的序列将占用18位，并且它被填充以占用24位（ 3字节）。

At the moment I use a for cycle for the conversion, but this solution is dreadfully slow. 目前我使用for循环进行转换，但这个解决方案非常慢。

def reverse_complement( my_bitarray, seq_length ):

    for i in range(0, 2 * seq_length - 1, 2):

        if my_bitarray[i] == my_bitarray[i + 1]:

            if my_bitarray[i] == 0:
                my_bitarray[i], my_bitarray[i + 1] = 1, 1

            else:
                my_bitarray[i], my_bitarray[i + 1] = 0, 0

    #padding if the bitarray is not a multiple of 8 bits in length
    if seq_length / 4 != int():
        my_bitarray.reverse()
        my_bitarray.fill()
        my_bitarray.reverse()

    return my_bitarray

a = bitarray()
a.frombytes(seq[::-1])
b = a[int(seq_start)::] # seq without padding
b.reverse()

reverse_complement(b, seq_length)

Any tips on how to make this process faster? 有关如何加快此过程的任何提示？

Answer 1

If you don't mind installing the boltons package from PyPI, you can do the following: 如果您不介意从PyPI安装boltons包，则可以执行以下操作：

from itertools import chain

from bitarray import bitarray
from boltons.iterutils import pairwise

original = bitarray('0010011100011010')
complement = ~original
reverse_complement = bitarray(chain.from_iterable(reversed(pairwise(complement))))
assert reverse_complement == bitarray('0101101100100111')

Update : 更新：

As of boltons v16.2.0 , pairwise does something else, so the answer should be changed to use chunked : 由于博尔顿v16.2.0 ， pairwise做别的东西，所以答案应改为使用chunked ：

from boltons.iterutils import chunked
reverse_complement = bitarray(chain.from_iterable(reversed(chunked(complement, 2))))

Answer 2

The code you provided doesn't give the answer you indicated. 您提供的代码没有给出您指出的答案。

Here is code that gives the correct answer. 这是给出正确答案的代码。 Perhaps it will also be fast enough: 也许它也足够快：

def reverse_complement(my_bitarray):
    # First reverse by twos
    my_bitarray = zip(my_bitarray[0::2], my_bitarray[1::2])
    my_bitarray = reversed(list(my_bitarray))
    my_bitarray = (i for t in my_bitarray for i in t)
    my_bitarray = bitarray(my_bitarray)

    # Then complement
    my_bitarray.invert()
    return my_bitarray

Note that you don't have to worry about the padding. 请注意，您不必担心填充。 bitarray.bitarray() manages all of that for you. bitarray.bitarray()为您管理所有这些。

Python bitarray反向补码

问题描述

2 个解决方案

解决方案1
1 2015-08-18 13:14:47

Update : 更新：

解决方案2
1 已采纳 2015-08-18 19:45:40

Python bitarray反向补码

问题描述

2 个解决方案

解决方案1 1 2015-08-18 13:14:47

Update : 更新 ：

解决方案2 1 已采纳 2015-08-18 19:45:40

解决方案1
1 2015-08-18 13:14:47

Update : 更新：

解决方案2
1 已采纳 2015-08-18 19:45:40