在Python中的多行上打印字符串的总和

Question

This is a simple question but I just can't solve it. 这是一个简单的问题，但我无法解决。 I want to count the number of A for each sequence of lines. 我想计算每个行序列的A数。 Please see the example below: 请参见以下示例：

This is my input: 这是我的输入：

>sca200
ACACGTGYNNNN
ACGTCCCGWCNN
NNNNNNNNNA
>scaf500
AAAAAAAAAAAA
TTTTTTTTTTTT
WCWCWNNNN
>scaf201
AACACACACACC
GTGTGTGTGTGT
WWRRRYNNNNNN
NNNNNN

code: 码：

#!/usr/bin/python
from __future__ import division
import sys

fasta = open(sys.argv[1], "r")

for line in fasta:
    line = line.rstrip("\n")
    if line.startswith(">"):
        total_A = 0
        print line[1:]
    else:
        A = line.count('A')
        total_A = total_A + A
        print total_A

The output is: 输出为：

sca200
2
3
4
scaf500
12
12
12
scaf201
6
6
6
6

How can I get it to report only the final number?, that is: 如何获取仅报告最终号码的信息？即：

sca200
4
scaf500
12
scaf201
6

Answer 1

try to split input by newlines and then output totals only if the line starts with < (what does mean that sequence is over) - remember about printing first and last record out of the loop 尝试用换行符分割输入，然后仅在行以<开头时才表示输出总计（这意味着序列结束了）-记住循环外打印第一个和最后一个记录

#!/usr/bin/python
from __future__ import division
import sys

#reading file into str variable
fasta_file = open(sys.argv[1], "r")
fasta = fasta_file.read()
fasta_file.close()

total = 0

print fasta.split('\n')[0][1:]

for f in fasta.split('\n')[1:]:
    if f[0] != '>':
        total += f.count('A')
    else:
        print total, '\n', f[1:]
        total = 0

print total

Answer 2

This should solve your problem: 这应该可以解决您的问题：

#!/usr/bin/python
from __future__ import division
import sys

fasta = open(sys.argv[1], "r")
total_A = None
for line in fasta:
    line = line.rstrip("\n")
    if line.startswith(">"): 
        print total_A if total_A != None else 0
        total_A = 0
        print line[1:]
    else:
        A = line.count('A')
        total_A += A
 print total_A

You just want to print the total count of A just when a new fasta header starts. 您只想在新的fasta标头启动时打印A的总数。

Note: Edited to address a comment raised by @Lafexlos. 注意：编辑以解决@Lafexlos提出的评论。

Answer 3

Here is one-liner solution: 这是一线解决方案：

from __future__ import print_function

import sys
import re

with open(sys.argv[1], 'r') as f:
    data = f.read()

"""
1.  find all blocks of text and split it into two groups: (block_name, corresponding_TEXT)
2.  loop through blocks
3.  print 'block_name' and the length of list containing all 'A's from the corresponding_TEXT
"""

[   print('{0}\n{1}'.format(name, len(re.findall(r'A', txt, re.M)))) 
    for name, txt in re.findall(r'>(sca[^\n]*)([^>]*)', data, re.M)
]

Output: 输出：

sca200
4
scaf500
12
scaf201
6

在Python中的多行上打印字符串的总和

问题描述

3 个解决方案

解决方案1
0 2016-02-25 15:13:59

解决方案2
0 2016-02-25 15:17:29

解决方案3
0 2016-02-25 15:55:47

在Python中的多行上打印字符串的总和

问题描述

3 个解决方案

解决方案1 0 2016-02-25 15:13:59

解决方案2 0 2016-02-25 15:17:29

解决方案3 0 2016-02-25 15:55:47

解决方案1
0 2016-02-25 15:13:59

解决方案2
0 2016-02-25 15:17:29

解决方案3
0 2016-02-25 15:55:47