查找从捕获组中的字符开始的所有可能的子字符串

Question

I have for example the string BANANA and want to find all possible substrings beginning with a vowel. 我有例如字符串BANANA并希望找到以元音开头的所有可能的子串。 The result I need looks like this: 我需要的结果如下：

"A", "A", "A", "AN", "AN", "ANA", "ANA", "ANAN", "ANANA"

I tried this: re.findall(r"([AIEOU]+\\w*)", "BANANA") but it only finds "ANANA" which seems to be the longest match. 我试过这个： re.findall(r"([AIEOU]+\\w*)", "BANANA")但它只发现"ANANA"似乎是最长的匹配。 How can I find all the other possible substrings? 我怎样才能找到所有其他可能的子串？

Answer 1

s="BANANA"
vowels = 'AIEOU'
sorted(s[i:j] for i, x in enumerate(s) for j in range(i + 1, len(s) + 1) if x in vowels)

Answer 2

This is a simple way of doing it. 这是一种简单的方法。 Sure there's an easier way though. 当然，有一个更简单的方法。

def subs(txt, startswith):
    for i in xrange(len(txt)):
        for j in xrange(1, len(txt) - i + 1):
            if txt[i].lower() in startswith.lower():
                yield txt[i:i + j]

s = 'BANANA'
vowels = 'AEIOU'
print sorted(subs(s, vowels))

Answer 3

A more pythonic way: 一种更加pythonic的方式：

>>> def grouper(s):
...     return [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
...
>>> vowels = {'A', 'I', 'O', 'U', 'E', 'a', 'i', 'o', 'u', 'e'}
>>> [t for t in grouper(s) if t[0] in vowels]
['A', 'A', 'A', 'AN', 'AN', 'ANA', 'ANA', 'ANAN', 'ANANA']

Benchmark with accepted answer: 已接受答案的基准：

from timeit import timeit

s1 = """
sorted(s[i:j] for i, x in enumerate(s) for j in range(i + 1, len(s) + 1) if x in vowels)
"""

s2 = """
def grouper(s):
     return [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
[t for t in grouper(s) if t[0] in vowels]
   """

print '1st: ', timeit(stmt=s1,
                      number=1000000,
                      setup="vowels = 'AIEOU'; s = 'BANANA'")
print '2nd : ', timeit(stmt=s2,
                       number=1000000,
                       setup="vowels = {'A', 'I', 'O', 'U', 'E', 'a', 'i', 'o', 'u', 'e'}; s = 'BANANA'")

result : 结果：

1st:  6.08756995201
2nd :  5.25555992126

Answer 4

As already mentioned in the comments, Regex would not be the right way to go about this. 正如评论中已经提到的，正则表达式不是正确的方法。

Try this 试试这个


def get_substr(string):
    holder = []
    for ix, elem in enumerate(string):
        if elem.lower() in "aeiou":
            for r in range(len(string[ix:])):
                holder.append(string[ix:ix+r+1])
    return holder

print get_substr("BANANA")
## ['A', 'AN', 'ANA', 'ANAN', 'ANANA', 'A', 'AN', 'ANA', 'A']

查找从捕获组中的字符开始的所有可能的子字符串

问题描述

4 个解决方案

解决方案1
13 已采纳 2016-02-17 13:08:19

解决方案2
6 2016-02-17 13:09:55

解决方案3
4 2016-02-17 13:29:32

解决方案4
2 2016-02-17 13:27:09

查找从捕获组中的字符开始的所有可能的子字符串

问题描述

4 个解决方案

解决方案1 13 已采纳 2016-02-17 13:08:19

解决方案2 6 2016-02-17 13:09:55

解决方案3 4 2016-02-17 13:29:32

解决方案4 2 2016-02-17 13:27:09

解决方案1
13 已采纳 2016-02-17 13:08:19

解决方案2
6 2016-02-17 13:09:55

解决方案3
4 2016-02-17 13:29:32

解决方案4
2 2016-02-17 13:27:09