简体   繁体   English

使用Clustal从每行打印50个序列

[英]Print 50 sequences from each line using Clustal

I have a multiple sequence alignment (Clustal) file and I want to read this file and arrange sequences in such a way that it looks more clear and precise in order. 我有一个多序列比对(Clustal)文件,我想阅读此文件并以使序列看起来更清晰准确的方式排列序列。

I am doing this from Biopython using an AlignIO object: 我正在使用AlignIO对象从Biopython进行此AlignIO

alignment = AlignIO.read("opuntia.aln", "clustal")

print "Number of rows: %i" % len(align)

for record in alignment:
    print "%s - %s" % (record.id, record.seq)

My output looks messy and long scrolling. 我的输出看起来很乱,长时间滚动。 What I want to do is print only 50 sequences in each line and continue until the end of the alignment file. 我想做的是每行仅打印50个序列,并继续直到比对文件结束。

I wish to have output like this , from http://www.ebi.ac.uk/Tools/clustalw2/ . 我希望从http://www.ebi.ac.uk/Tools/clustalw2/获得类似的输出。

Do you require something more complex than simply breaking record.seq into chunks of 50 characters, or am I missing something? 您是否需要比简单地将record.seq分成50个字符的块更复杂的东西,还是我错过了什么?

You can use Python sequence slicing to achieve that very easily. 您可以使用Python序列切片轻松实现这一点。 seq[N:N+50] accesses the 50 sequence elements starting with N: seq[N:N+50]访问以N开头的50个序列元素:

In [24]: seq = ''.join(str(random.randint(1, 4)) for i in range(200))

In [25]: seq
Out[25]: '13313211211434211213343311221443122234343421132111223234141322124442112343143112411321431412322123214232414331224144142222323421121312441313314342434231131212124312344112144434314122312143242221323123'

In [26]: for n in range(0, len(seq), 50):
   ....:     print seq[n:n+50]
   ....:     
   ....:     
13313211211434211213343311221443122234343421132111
22323414132212444211234314311241132143141232212321
42324143312241441422223234211213124413133143424342
31131212124312344112144434314122312143242221323123

Br,

I don't have biopython on this computer, so this isn't tested, but it should work: 我在这台计算机上没有biopython,因此未经测试,但应该可以正常工作:

chunk_size = 50

for i in range(0, alignment.get_alignment_length(), chunk_size):
    print ""
    for record in alignment:
        print "%s\t%s %i" % (record.name,  record.seq[i:i + chunk_size], i + chunk_size)

Does the same trick as Eli's one - using range to set up an index to slice from then iterating over the record in the alignment for each slice. 做与Eli相同的技巧-使用范围设置索引以切片,然后在每个切片的对齐方式中迭代记录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用c#如何打印从解析文件中随机选择的行的行号 - using c# how to Print Line number of randomly selected line from parsed file 使用 python 将段落中的每个新行上的句子分开 - Separate sentences on each new line from paragraph using python 在不使用Biopython的情况下,将fasta文件中的标题与序列分开 - Separate headers from sequences in fasta file without using Biopython 使用set()和FastqGeneralIterator()从fastq文件中提取序列的子集 - Extracting a subset of Sequences from fastq files using set() and FastqGeneralIterator() 使用Excel中的ID列表以Fasta格式保存NCBI中的序列 - save sequences from NCBI in fasta format using a list of IDs in excel 在BioPython中使用Entrez从GenBank检索和解析蛋白质序列 - Retrieving and parsing protein sequences from GenBank using Entrez in BioPython 从textarea获取每一行 - Get each line from textarea 如何使用 BeautifulSoup 逐行打印文本? - How can I print texts line by line input using BeautifulSoup? 如何使用子进程逐行打开和打印文件的内容? - How to open and print the contents of a file line by line using subprocess? 正则表达式:从每行的特定点开始 - Regex: Starting from a specific point on each line
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM