[英]Python, Bioinformatics query
I am new to python and I would like to know if what I am attempting is possible. 我是python的新手,我想知道我的尝试是否可行。 I have a section here from a DNA alignment and I was wondering if for each location of a gap "-" on the bottom I could identify the nucleotide on the top line.
我在这里有一个DNA对齐的部分,我想知道在底部的“ - ”的每个位置是否可以识别顶部的核苷酸。 Here I would be looking to return "G".
在这里,我希望返回“G”。
My efforts so far have not been successful. 到目前为止,我的努力没有成功。 The alignment is:
对齐方式是:
ATTCAGGCCTAGCA
::::: :: ::::
ATTCAA-CCAAGCA
I appreciate any assistance! 我感谢任何帮助!
Not sure how your data is saved. 不确定您的数据是如何保存的。 Let's say it's two equal length strings in a tuple:
让我们说它是一个元组中的两个相等长度的字符串:
dna_pair = ('ATTCAGGCCTAGCA','ATTCAA-CCAAGCA')
Then you could try: 然后你可以尝试:
def find_align(dna_pair):
for i in range(len(dna_pair[0])):
if dna_pair[1][i] == '-':
return dna_pair[0][i]
above = 'ATTCAGGCCTAGCA'
below = 'ATTCAA-CCAAGCA'
gap_letters = [above[i] for i,j in enumerate(below) if j=='-']
You'd better use biopython library. 你最好使用biopython库。 It has many data types designed to manipulate DNA, RNA and protein sequences (alignments, trees, etc).
它有许多数据类型,旨在操纵DNA,RNA和蛋白质序列(比对,树木等)。 In this case AlignIO from biopython package will definitely help you.
在这种情况下,来自biopython包的AlignIO肯定会对你有所帮助。
from Bio import AlignIO
# reading your sequences:
alignment = AlignIO.read("my_seq.fa", "fasta")
# length of any alignment row is equal, so number of columns is here
cols = len(alignment[0])
# access to the rows and columns is like in the Numpy array
for col in range(cols):
if alignment[ : , col][1] == "-":
print("gap!")
As I don't have any information about the data format, I will tell you the general process. 由于我没有关于数据格式的任何信息,我将告诉您一般过程。 Create 2 lists with the first and last line respectively (which I suppose are aligned and have the same length) and iterate over them.
分别创建包含第一行和最后一行的2个列表(我认为它们是对齐的并且具有相同的长度)并迭代它们。 At each step verify if the character at the current position in the last array is a '-' and if so, print the character from the other array.
在每个步骤验证最后一个数组中当前位置的字符是否为' - ',如果是,则打印另一个数组中的字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.