获取正则表达式的所有可能匹配项（在python中）？

Question

I have a regex that can match a string in multiple overlapping possible ways. 我有一个可以以多种可能的重叠方式匹配字符串的正则表达式。 However, it seems to only capture one possible match in the string, how can I get all possible matches? 但是，它似乎只能捕获字符串中的一个可能的匹配项，如何获得所有可能的匹配项？ I've tried finditer with no success, but maybe I'm using it wrong. 我尝试过finditer并没有成功，但是也许我用错了。

The string I'm trying to parse is: 我要解析的字符串是：

foo-foobar-foobaz

The regex I'm using is: 我正在使用的正则表达式是：

(.*)-(.*)

>>> s = "foo-foobar-foobaz"
>>> matches = re.finditer(r'(.*)-(.*)', s)
>>> [match.group(1) for match in matches]
['foo-foobar']

I want the match (foo and foobar-foobaz), but it seems to only get (foo-foobar and foobaz). 我想要匹配（foo和foobar-foobaz），但似乎只能得到（foo-foobar和foobaz）。

Answer 1

No problem: 没问题：

>>> regex = "([^-]*-)(?=([^-]*))"
>>> for result in re.finditer(regex, "foo-foobar-foobaz"):
>>>     print("".join(result.groups()))
foo-foobar
foobar-foobaz

By putting the second capturing parenthesis in a lookahead assertion , you can capture its contents without consuming it in the overall match. 通过将第二个捕获括号放入前瞻断言中，您可以捕获其内容，而无需在整体匹配中使用它。

I've also used [^-]* instead of .* because the dot also matches the separator - which you probably don't want. 我还用[^-]*代替了.*因为该点还与分隔符匹配-您可能不希望这样。

Answer 2

It's not something regex engines tend to be able to do. 正则表达式引擎往往无法做到这一点。 I don't know if Python can. 我不知道Python是否可以。 Perl can using the following: Perl可以使用以下内容：

local our @matches;
"foo-foobar-foobaz" =~ /
    ^(.*)-(.*)\z
    (?{ push @matches, [ $1, $2 ] })
    (*FAIL)
/xs;

This specific problem can probably be solved using the regex engine in many languages using the following technique: 使用以下技术，可以使用多种语言的正则表达式引擎来解决此特定问题：

my @matches;
while ("foo-foobar-foobaz" =~ /(?=-(.*)\z)/gsp) {
   push @matches, [ ${^PREMATCH}, $1 ];
}

( ${^PREMATCH} refers to what comes before where the regex matched, and $1 refers to what the first () matched.) （ ${^PREMATCH}表示正则表达式匹配之前的内容， $1表示第一个()匹配的内容。）

But you can easily solve this specific problem outside the regex engine: 但是您可以在正则表达式引擎之外轻松解决此特定问题：

my @parts = split(/-/, "foo-foobar-foobaz");
my @matches;
for (1..$#parts) {
   push @matches, [
      join('-', @parts[0..$_-1]),
      join('-', @parts[$_..$#parts]),
   ];
}

Sorry for using Perl syntax, but should be able to get the idea. 很抱歉使用Perl语法，但应该可以理解。 Translations to Python welcome. 欢迎翻译成Python。

Answer 3

If you want to detect overlapping matches, you'll have to implement it yourself - essentially, for a string foo 如果要检测重叠的匹配项，则必须自己实现-本质上是对于字符串foo

Find the first match that starts at string index i 查找从字符串索引i开始的第一个匹配项
Run the matching function again against foo[i+1:] 再次对foo[i+1:]运行匹配功能
Repeat steps 1 and 2 on the incrementally short remaining portion of the string. 在字符串的剩余部分逐渐变短上重复步骤1和2。

It gets trickier if you're using arbitrary-length capture groups (eg (.*) ) because you probably don't want both foo-foobar and oo-foobar as matches, so you'd have to do some extra analysis to move i even farther than just +1 each match; 如果您使用任意长度的捕获组（例如(.*) ），它将变得更加棘手，因为您可能不希望同时使用foo-foobar和oo-foobar作为匹配项，因此您必须进行一些额外的分析才能移动i甚至比每场比赛都+1 ； you'd need to move it the entire length of the first captured group's value, plus one. 您需要将其移动到第一个捕获组值的整个长度，再加上一个。

获取正则表达式的所有可能匹配项（在python中）？

问题描述

3 个解决方案

解决方案1
5 已采纳 2011-09-12 06:13:31

解决方案2
2 2011-09-12 06:04:00

解决方案3
1 2011-09-12 06:05:42

获取正则表达式的所有可能匹配项（在python中）？

问题描述

3 个解决方案

解决方案1 5 已采纳 2011-09-12 06:13:31

解决方案2 2 2011-09-12 06:04:00

解决方案3 1 2011-09-12 06:05:42

解决方案1
5 已采纳 2011-09-12 06:13:31

解决方案2
2 2011-09-12 06:04:00

解决方案3
1 2011-09-12 06:05:42