简体   繁体   English

Python,Bioinformatics查询

[英]Python, Bioinformatics query

I am new to python and I would like to know if what I am attempting is possible. 我是python的新手,我想知道我的尝试是否可行。 I have a section here from a DNA alignment and I was wondering if for each location of a gap "-" on the bottom I could identify the nucleotide on the top line. 我在这里有一个DNA对齐的部分,我想知道在底部的“ - ”的每个位置是否可以识别顶部的核苷酸。 Here I would be looking to return "G". 在这里,我希望返回“G”。

My efforts so far have not been successful. 到目前为止,我的努力没有成功。 The alignment is: 对齐方式是:

ATTCAGGCCTAGCA
:::::  :: ::::
ATTCAA-CCAAGCA

I appreciate any assistance! 我感谢任何帮助!

Not sure how your data is saved. 不确定您的数据是如何保存的。 Let's say it's two equal length strings in a tuple: 让我们说它是一个元组中的两个相等长度的字符串:

dna_pair = ('ATTCAGGCCTAGCA','ATTCAA-CCAAGCA')

Then you could try: 然后你可以尝试:

def find_align(dna_pair):
    for i in range(len(dna_pair[0])):
        if dna_pair[1][i] == '-':
            return dna_pair[0][i]
above = 'ATTCAGGCCTAGCA'
below = 'ATTCAA-CCAAGCA'
gap_letters = [above[i] for i,j in enumerate(below) if j=='-']

You'd better use biopython library. 你最好使用biopython库。 It has many data types designed to manipulate DNA, RNA and protein sequences (alignments, trees, etc). 它有许多数据类型,旨在操纵DNA,RNA和蛋白质序列(比对,树木等)。 In this case AlignIO from biopython package will definitely help you. 在这种情况下,来自biopython包的AlignIO肯定会对你有所帮助。

from Bio import AlignIO
# reading your sequences:
alignment = AlignIO.read("my_seq.fa", "fasta")

# length of any alignment row is equal, so number of columns is here

cols = len(alignment[0])
# access to the rows and columns is like in the Numpy array
for col in range(cols):  
    if alignment[ : , col][1] == "-":
        print("gap!")

As I don't have any information about the data format, I will tell you the general process. 由于我没有关于数据格式的任何信息,我将告诉您一般过程。 Create 2 lists with the first and last line respectively (which I suppose are aligned and have the same length) and iterate over them. 分别创建包含第一行和最后一行的2个列表(我认为它们是对齐的并且具有相同的长度)并迭代它们。 At each step verify if the character at the current position in the last array is a '-' and if so, print the character from the other array. 在每个步骤验证最后一个数组中当前位置的字符是否为' - ',如果是,则打印另一个数组中的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为生物信息学查询细化python脚本 - How to refine a python script for a bioinformatics query Python中的生物信息学序列聚类 - bioinformatics sequence clustering in Python 用于图像分析和生物信息学的 Python 代码 - Python code for image analysis and Bioinformatics 如何在生物信息学上并行运行 python 脚本 - how to parallel running of python scripts on bioinformatics Python 中的生物信息学:我的代码是否有更短的方法? 有很多“if”和“for”语句 - Bioinformatics in Python: Is there a less lengthy approach to my code? Has a lot of “if” and “for” statements 将数字转换为模式作为 DNA 序列:尝试用 python 解决这个生物信息学问题 - converting number to pattern as DNA sequence : trying to solve this bioinformatics problem with python 使用Python / Biopython / Clustalw的生物信息学脚本,使用stdout遍历蛋白质目录 - Bioinformatics script using Python/Biopython/Clustalw using stdout to iterate over a directory of proteins 遇到生物信息学课程的问题 - Stuck on a problem for bioinformatics course 如何提高生物信息学脚本的速度? - How to improve speed in bioinformatics script? 生物信息学:以编程方式访问BacDive数据库 - Bioinformatics : Programmatic Access to the BacDive Database
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM