简体   繁体   English

计算python中的间隙数

[英]counting number of gaps in python

how can I calculate the number of gaps in sequences: 如何计算序列中的间隙数:

for example: 例如:

s1='G _ A A T T C A G T T A'
s2='G G _ A _ T C _ G _ _ A'
s3='G A A T T C A G T _ T _'

her the number of '_' is 8 她的'_'8

I try the following: 我尝试以下方法:

def count():
    gap=0
    for i in range(0, len(s1), 3):
        for x,y,z in zip(s1,s2,s3):
            if (x=='_') or (y=='_')or (z=='_') :
                gap=gap+1
        return gap

it gives 6 not 8 它给出了6而不是8

字符串有一个count()方法:

s1.count('_') + s2.count('_') + s3.count('_')

Your code returns 7 which is the total count of all the underscores minus the extra underscore in the third to last position. 您的代码返回7 ,这是所有下划线减去第三个到最后一个位置的额外下划线的总数。 You can fix that by removing the or-test (which short-circuits the tests when a match is found). 您可以通过删除or-test(在找到匹配项时将测试短路)来解决此问题。

Also note there is no need to triple-zip the code or to loop with a stride-of-three. 另请注意,不需要对代码进行三重压缩或循环使用三步。

Here is a cleaned-up version of your original code: 以下是原始代码的清理版本:

def count():
    gap=0
    for x,y,z in zip(s1,s2,s3):
        if (x == '_'):               # these if-stmts don't short-circuit
            gap += 1
        if (y == '_'):
            gap += 1
        if (z == '_'):
            gap += 1
    return gap

There are other ways to do this faster (ie the str.count method) but I wanted to show you how to repair and clean-up your original logic. 还有其他方法可以更快地执行此操作(即str.count方法),但我想向您展示如何修复和清理原始逻辑。 That ought to put you on the right track when you do other analytics. 当你进行其他分析时,这应该让你走上正确的轨道。

The two _ 's in the 10th position only get counted twice. 位于第10位的两个_只计算两次。 You should get 7, rather than 6. 你应该得到7而不是6。

The simple solution is sum([item.count('_') for item in [s1,s2,s3]]) 简单的解决方案是sum([item.count('_') for item in [s1,s2,s3]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM