Python：重新查找最长序列

Question

I have a string that is randomly generated:我有一个随机生成的字符串：

polymer_str = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine"

I'd like to find the longest sequence of "diNCO diol" and the longest of "diNCO diamine".我想找到最长的“diNCO diol”序列和最长的“diNCO diamine”序列。 So in the case above the longest "diNCO diol" sequence is 1 and the longest "diNCO diamine" is 3.所以在上面的例子中，最长的“diNCO diol”序列是1，最长的“diNCO diamine”是3。

How would I go about doing this using python's re module?我将如何使用 python 的 re 模块执行此操作？

Thanks in advance.提前致谢。

EDIT:编辑：
I mean the longest number of repeats of a given string.我的意思是给定字符串的最长重复次数。 So the longest string with "diNCO diamine" is 3:所以带有“diNCO diamine”的最长字符串是 3：
diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine二醇二NCO二胺二NCO二胺二NCO二胺二NCO二醇二NCO二胺

Answer 1

Expanding on Ealdwulf 's answer :扩展Ealdwulf的回答：

Documentation on re.findall can be found here .可以在此处找到有关re.findall文档。

def getLongestSequenceSize(search_str, polymer_str):
    matches = re.findall(r'(?:\b%s\b\s?)+' % search_str, polymer_str)
    longest_match = max(matches)
    return longest_match.count(search_str)

This could be written as one line, but it becomes less readable in that form.这可以写成一行，但以这种形式可读性会降低。

Alternative:选择：

If polymer_str is huge, it will be more memory efficient to use re.finditer .如果polymer_str很大，那么使用re.finditer内存效率会re.finditer 。 Here's how you might go about it:你可以这样做：

def getLongestSequenceSize(search_str, polymer_str):
    longest_match = ''
    for match in re.finditer(r'(?:\b%s\b\s?)+' % search_str, polymer_str):
        if len(match.group(0)) > len(longest_match):
            longest_match = match.group(0)
    return longest_match.count(search_str)

The biggest difference between findall and finditer is that the first returns a list object, while the second iterates over Match objects. findall和finditer之间最大的区别在于，第一个返回一个列表对象，而第二个则遍历 Match 对象。 Also, the finditer approach will be somewhat slower.此外， finditer方法会稍微慢一些。

Answer 2

I think the op wants the longest contiguous sequence.我认为操作需要最长的连续序列。 You can get all contiguous sequences like: seqs = re.findall("(?:diNCO diamine)+", polymer_str)您可以获得所有连续序列，例如：seqs = re.findall("(?:diNCO diamine)+",polymer_str)

and then find the longest.然后找到最长的。

Answer 3

import re
pat = re.compile("[^|]+")
p = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine".replace("diNCO diamine","|").replace(" ","")
print max(map(len,pat.split(p)))

Answer 4

One was is to use findall :一种是使用findall ：

polymer_str = "diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine"
len(re.findall("diNCO diamine", polymer_str)) # returns 4.

Answer 5

Using re:使用重新：

 m = re.search(r"(\bdiNCO diamine\b\s?)+", polymer_str)
 len(m.group(0)) / len("bdiNCO diamine")

Python：重新查找最长序列

问题描述

5 个解决方案

解决方案1
9 已采纳 2009-07-20 20:31:51

解决方案2
3 2009-07-20 19:37:33

解决方案3
3 2009-07-21 00:25:54

解决方案4
0 2009-07-20 19:25:40

解决方案5
0

Python：重新查找最长序列

问题描述

5 个解决方案

解决方案1 9 已采纳 2009-07-20 20:31:51

解决方案2 3 2009-07-20 19:37:33

解决方案3 3 2009-07-21 00:25:54

解决方案4 0 2009-07-20 19:25:40

解决方案5 0

解决方案1
9 已采纳 2009-07-20 20:31:51

解决方案2
3 2009-07-20 19:37:33

解决方案3
3 2009-07-21 00:25:54

解决方案4
0 2009-07-20 19:25:40

解决方案5
0