[英]printing the sum of the counts of a string over multiple lines in Python
这是一个简单的问题,但我无法解决。 我想计算每个行序列的A数。 请参见以下示例:
这是我的输入:
>sca200
ACACGTGYNNNN
ACGTCCCGWCNN
NNNNNNNNNA
>scaf500
AAAAAAAAAAAA
TTTTTTTTTTTT
WCWCWNNNN
>scaf201
AACACACACACC
GTGTGTGTGTGT
WWRRRYNNNNNN
NNNNNN
码:
#!/usr/bin/python
from __future__ import division
import sys
fasta = open(sys.argv[1], "r")
for line in fasta:
line = line.rstrip("\n")
if line.startswith(">"):
total_A = 0
print line[1:]
else:
A = line.count('A')
total_A = total_A + A
print total_A
输出为:
sca200
2
3
4
scaf500
12
12
12
scaf201
6
6
6
6
如何获取仅报告最终号码的信息?即:
sca200
4
scaf500
12
scaf201
6
尝试用换行符分割输入,然后仅在行以<开头时才表示输出总计 (这意味着序列结束了)-记住循环外打印第一个和最后一个记录
#!/usr/bin/python
from __future__ import division
import sys
#reading file into str variable
fasta_file = open(sys.argv[1], "r")
fasta = fasta_file.read()
fasta_file.close()
total = 0
print fasta.split('\n')[0][1:]
for f in fasta.split('\n')[1:]:
if f[0] != '>':
total += f.count('A')
else:
print total, '\n', f[1:]
total = 0
print total
这应该可以解决您的问题:
#!/usr/bin/python
from __future__ import division
import sys
fasta = open(sys.argv[1], "r")
total_A = None
for line in fasta:
line = line.rstrip("\n")
if line.startswith(">"):
print total_A if total_A != None else 0
total_A = 0
print line[1:]
else:
A = line.count('A')
total_A += A
print total_A
您只想在新的fasta标头启动时打印A
的总数。
注意:编辑以解决@Lafexlos提出的评论。
这是一线解决方案:
from __future__ import print_function
import sys
import re
with open(sys.argv[1], 'r') as f:
data = f.read()
"""
1. find all blocks of text and split it into two groups: (block_name, corresponding_TEXT)
2. loop through blocks
3. print 'block_name' and the length of list containing all 'A's from the corresponding_TEXT
"""
[ print('{0}\n{1}'.format(name, len(re.findall(r'A', txt, re.M))))
for name, txt in re.findall(r'>(sca[^\n]*)([^>]*)', data, re.M)
]
输出:
sca200
4
scaf500
12
scaf201
6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.