简体   繁体   English

在 Python 上的字符串上查找模式

[英]Finding patterns on strings on Python

My function was suposed to receive a large string, go through it, and find the maximum number of times the pattern "AGATC" repeats consecutively.我的 function 应该通过它接收一个大字符串 go,并找到模式“AGATC”连续重复的最大次数。 Regardless of what I feed this function, my return is always 1.不管我给这个 function 喂什么,我的回报总是 1。

def agatc(s):
    maxrep = 0
    temp = 0
    for i in range(len(s) - 4):
        if s[i] == "A" and s[i + 1] == "G" and s[i + 2] == "A" and s[i + 3] == "T" and s[i + 4] == "C":
            temp += 1
            print(i)
            i += 3
        else:
            if temp > maxrep:
                maxrep = temp
            temp = 0
    return maxrep

Also tried initializing the for loop with (0, len(s) - 4, 1) , got the same return.还尝试使用(0, len(s) - 4, 1)初始化 for 循环,得到相同的返回。

I though the problem might be in adding 3 to the i variable (apparently it wasn't), so I added print(i) to see what was happening.我虽然问题可能在于将 3 添加到i变量(显然不是),所以我添加了print(i)以查看发生了什么。 I got the following:我得到以下信息:

45
1938
2049
2195
2952
2957
2962
2967
2972
2977
2982
2987
2992
2997
3002
3007
3012
3017
3022
3689
4754

In this way you can find the number of overlapping matches:通过这种方式,您可以找到重叠匹配的数量:

def agatc(s):
    temp = 0
    for i in range(len(s) - len("AGATC") + 1):
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
    return temp

If you want to find non-overlapping matches:如果要查找不重叠的匹配项:

def agatc(s):
    temp = 0
    i = 0
    while i < len(s) - len("AGATC") + 1:
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
            i += len("AGATC")
        else:
            i += 1
    return temp

A simple solution with module re带有模块re的简单解决方案

import re

s = 'FGHAGATCATCFJSFAGATCAGATCFHGH'
match = re.finditer('(?P<name>AGATC)+', s)
max_len = 0
result = tuple()
for m in match:
    l = m.end() - m.start()
    if l > max_len:
        max_len = l
        result = (m.start(), m.end())

print(result)

Personally I would use regular expressions.我个人会使用正则表达式。 But if you do not want that, you could use the str.find() method.但如果你不想这样,你可以使用 str.find() 方法。 Here is my solution:这是我的解决方案:

def agatc(s):
    cnt = 0
    findstr='aga'                             # pattern you are looking for
    for i in range(len(s)):
        index = s.find(findstr)
        if index != -1:
            cnt+=1
            s = s[index+1:]                   # overlapping matches
            # s = s[index+len(findstr):]      # non-overlapping matches only
            print(index, s)                   # just to see what happens
    return cnt

This function counts the greatest amount of consecutive 'AGATC' s in a string and returns the amount:此 function 计算字符串中连续'AGATC'的最大数量并返回数量:

def agatc(s):
    w = "AGATC"
    maxrep = [m.start() for m in re.finditer(w,s)] # The beginning index fror each AGATC
    c = ''
    for i,v in enumerate(maxrep):
        if i < len(maxrep)-1:
            if v+5 == maxrep[i+1]:
                c+='y'
            else:
                c+='n'

    return len(max(c.split('n')))+1

print(agatc("oooooooooAGATCooooAGATCAGATCAGATCAGATCooooooAGATCAGATC"))

Output: Output:

4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM