正则表达式匹配多个正面前瞻组

Question

Here is the regex I have so far: 这是我到目前为止的正则表达式：

^(?=.*(option1|option2))(?=.*(option3|option4))(?=.*(option5|option6))(?=.*(option7|option8))(?=.*(option9|option10)).*$

I am not hip on the regex language so I'll make my own definitions: 我对正则表达式语言不熟悉，所以我将自己定义：

I would like to capture values where at least 1 option from 3 or more of the categories is found, like this: 我想捕获三个或更多类别中至少有一个选项的值，如下所示：

some text option3 some more text option8 some more text option1 一些文字2选项一些文字选项8一些文字选项1

OR 要么

some text option3 some more text option8 some more text option1 some more text option6 一些文字2选项一些文字选项8一些文字选项1一些文字选项6

I don't want to capture values like this: 我不想捕获这样的值：

some text option3 some more text option8 - only 2 categories are represented 一些文本选项3更多文本选项8 - 仅表示2个类别

OR 要么

some text option3 some more text option4 some more text option1 (options 3 and 4 are from the same category) 一些文本选项3一些文字3选项一些更多的文本选项1（选项3和4是从相同的类别）

The options can appear in any order in the text, so that is why I was using the positive lookahead, but I don't know how to put a quantifier on multiple positive lookaheads. 选项可以在文本中以任何顺序出现，这就是为什么我使用正向前瞻，但我不知道如何将量词放在多个正向前瞻。

As far as regex engine goes, I have to use a front end UI that is powered by python in the background. 就正则表达式引擎而言，我必须在后台使用由python驱动的前端UI。 I can only use regex, I don't have the ability to use any other python functions. 我只能使用正则表达式，我没有能力使用任何其他python函数。 Thanks! 谢谢！

Answer 1

I don't think this is implementable with regex, or if it is (maybe in some steps), it's not a proper way to go. 我不认为这是可以用正则表达式实现的，或者如果它是（可能在某些步骤中），它不是一个正确的方法。

Instead you can store your options in a set like: 相反，您可以将选项存储在以下集合中：

options = {(option1, option2), (option3, option4), (option5, option6), (option7, option8), (option9, option10)}

Then check the membership like following: 然后查看会员资格如下：

if sum(i in my_text or j in my_text for i, j in options) >= 3:
    # do something

Here is a Demo: 这是一个演示：

>>> s1 = "some text option8 some more text option3 some more text option1"
>>> s2 = "some text option3 some more text option4 some more text option1"
>>> s3 = "some text option3 some more text option8"
>>> 
>>> options = {('option1', 'option2'), ('option3', 'option4'), ('option5', 'option6'), ('option7', 'option8'), ('option9', 'option10')}
>>> 
>>> sum(i in s1 or j in s1 for i, j in options)
3
>>> sum(i in s2 or j in s2 for i, j in options)
2
>>> sum(i in s3 or j in s3 for i, j in options)
2

Answer 2

Here's a regex that does what you want (in VERBOSE mode): 这是一个正如你想要的正则表达式（在VERBOSE模式下）：

^
(?= .* (?: option1 | option2 )  () )?
(?= .* (?: option3 | option4 )  () )?
(?= .* (?: option5 | option6 )  () )?
(?= .* (?: option7 | option8 )  () )?
(?= .* (?: option9 | option10 ) () )?
.*$
(?: \1\2\3 | \1\2\4 | \1\2\5 | \1\3\4 | \1\3\5 |
    \1\4\5 | \2\3\4 | \2\3\5 | \2\4\5 | \3\4\5 )

The empty groups serve as check boxes: if the enclosing lookahead doesn't succeed, a backreference to that group won't succeed. 空组用作复选框：如果封闭的前瞻不成功，则对该组的反向引用将不会成功。 The non-capturing group at the end contains all possible combinations of three out of five backreferences. 最后的非捕获组包含五个后向引用中的三个的所有可能组合。

The limitations of this approach are obvious; 这种方法的局限性是显而易见的; you need only add one more set of option s for it to get completely out of hand. 你只需要添加一组option就可以完全失控。 I think you'd be better off with a non-regex solution. 我认为使用非正则表达式解决方案会更好。

正则表达式匹配多个正面前瞻组

问题描述

2 个解决方案

解决方案1
1 2016-08-03 00:57:39

解决方案2
1 已采纳 2016-08-03 04:52:46

正则表达式匹配多个正面前瞻组

问题描述

2 个解决方案

解决方案1 1 2016-08-03 00:57:39

解决方案2 1 已采纳 2016-08-03 04:52:46

解决方案1
1 2016-08-03 00:57:39

解决方案2
1 已采纳 2016-08-03 04:52:46