简体   繁体   English

在字符串 Python 上查找最大连续出现次数

[英]Find Max consecutive occurrences on a String Python

I need to find the Max consecutive occurrences on a String based on the condition that they can count as more than one if they are consecutive if we have a match of the same word inside of the same sequence but is not consecutive doesn't count, here is an example:我需要根据条件找到字符串上的最大连续出现次数,如果它们是连续的,如果我们在同一序列中有相同单词的匹配但不是连续的,则它们可以算作多个,不计算在内,这是一个例子:

sequence = "ababcbabc"
words = ["ab", "babc", "bca"]

output:输出:

[2, 2, 0]

because we can see that 'ab' is actually repeated 3 times on the sequence string, however the condition says that the third one doesn't count because is not consecutive, the same rule apply for 'babc' and 0 if the evaluated word doesn't exist like in the case of 'bca, i have tried with sequence.find but only gives me where the first occurrence starts which is not very convenient to evaluated if the occurrences are together or not, same thing with sequence.rfind, sequence.count gives me all the occurrences but without any condition with .count i get output = [3, 2], also tried with re.findall re.finditer因为我们可以看到 'ab' 实际上在序列字符串上重复了 3 次,但是条件说第三个不计数,因为不连续,如果评估的单词不连续,同样的规则适用于 'babc' 和 0 '不像'bca那样存在,我已经尝试过sequence.find但只给了我第一次出现的开始位置,这对于评估出现是否在一起不是很方便,与sequ​​ence.rfind相同的事情,sequence .count 给了我所有的出现,但没有任何条件 .count 我得到输出 = [3, 2],也试过 re.findall re.finditer

in case we have a sequence like this 'abrtfhg' since there is only one match that count as 1 so the output on this case should be: [1,0, 0]如果我们有一个像这样的“abrtfhg”的序列,因为只有一个匹配项计为 1,所以这种情况下的输出应该是:[1,0, 0]

def maxKOccurrences(sequence, words):
    result = []
    for i in words:
        if i in sequence:
            index_word = sequence.count(i)
            result.append(index_word)
        else:
            result.append(0)
    print(result)

x = "ababcbabc"
y = ["ab", "babc", "bca"]
maxKOccurrences(x, y)

You can try:你可以试试:

import re

def max_occur(s, words):
    repeats = [list(map(lambda x: len(x) // len(word), re.findall(rf'(?:{word})+', s))) for word in words]
    return [1 if max(rep, default=0) == 1 else sum(r for r in rep if r > 1) for rep in repeats]

print(max_occur('ababcbabc', ["ab", "babc", "bca"])) # [2, 2, 0]
print(max_occur('aaabaa', ['a', 'b', 'c'])) # [5, 1, 0]
print(max_occur('aaabaabb', ['a', 'b', 'c'])) # [5, 2, 0]

The regex here detects repeats of word , and then divide the length of each repeat by the length of the word ( len(x) // len(word) )., which yields the number of repeats.这里的正则表达式检测word重复,然后将每个重复的长度除以word的长度 ( len(x) // len(word) ),从而得出重复的次数。 The rest of the code processes this result with somewhat complicated logic;其余的代码用有些复杂的逻辑处理这个结果; if the occurrence is at max 1 (ie, only singleton), then just spit out 1. Otherwise sum the repeats other than singletons.如果出现次数最多为 1(即,只有单例),则只吐出 1。否则对除单例以外的重复进行求和。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM