匹配正则表达式中任何顺序的所有字符

Question

I'm a regex newbie, but I understand how to match any characters in a regex query in order (ex. [abc] will match any of a, b or c. Also, I believe "abc" will match abc exactly). 我是一个正则表达式新手，但我理解如何按顺序匹配正则表达式查询中的任何字符（例如[abc]将匹配a，b或c中的任何一个。另外，我相信“abc”将完全匹配abc）。

However, how do I construct a regex query that will match all the characters abc in any order? 但是，如何构造一个匹配所有字符abc的正则表达式查询？ So for example, I want it to match "cab" or "bracket". 例如，我希望它匹配“cab”或“bracket”。 I'm using Python as my scripting language (not sure if this matters or not). 我使用Python作为我的脚本语言（不确定这是否重要）。

Answer 1

In Python, I wouldn't use a regualar expression for this purpose, but rather a set: 在Python中，我不会为此目的使用regualar表达式，而是使用set：

>>> chars = set("abc")
>>> chars.issubset("bracket")
True
>>> chars.issubset("fish")
False
>>> chars.issubset("bad")
False

Regular expressions are useful, but there are situations where different tools are more appropriate. 正则表达式很有用，但有些情况下不同的工具更合适。

Answer 2

This can be done with lookahead assertions: 这可以通过先行断言来完成：

^(?=.*a)(?=.*b)(?=.*c)

matches if your string contains at least one occurrence of a , b and c . 如果您的字符串包含至少一个a ， b和c匹配项，则匹配。

But as you can see, that's not really what regexes are good at. 但正如你所看到的，那并不是正则表达式所擅长的。

I would have done: 我会做的：

if all(char in mystr for char in "abc"):
    # do something

Checking for speed: 检查速度：

>>> timeit.timeit(stmt='chars.issubset("bracket");chars.issubset("notinhere")',
... setup='chars=set("abc")')
1.3560583674019995
>>> timeit.timeit(stmt='all(char in "bracket" for char in s);all(char in "notinhere" for char in s)', 
... setup='s="abc"')
1.4581878714681409
>>> timeit.timeit(stmt='r.match("bracket"); r.match("notinhere")', 
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)")')
1.0582279123082117

Hey, look, the regex wins! 嘿，看，正则表达胜利！ This even holds true for longer search strings: 这甚至适用于较长的搜索字符串：

>>> timeit.timeit(stmt='chars.issubset("bracketed");chars.issubset("notinhere")', 
... setup='chars=set("abcde")')
1.4316702294817105
>>> timeit.timeit(stmt='all(char in "bracketed" for char in s);all(char in "notinhere" for char in s)', 
... setup='s="abcde"')
1.6696223364866682
>>> timeit.timeit(stmt='r.match("bracketed"); r.match("notinhere")', 
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?:.*e)")')
1.1809254199004044

Answer 3

Here is a timeit comparison of issubset versus the regex solutions. 以下是issubset与正则表达式解决方案的时间比较。

import re

def using_lookahead(text):
    pat=re.compile(r'^(?=.*a)(?=.*b)(?=.*c)')
    return pat.search(text)

def using_set(text):
    chars=set('abc')
    return chars.issubset(text)

For small strings, issubset may be slightly faster: 对于小字符串， issubset可能会稍快一点：

% python -mtimeit -s'import test' "test.using_set('bracket')"
100000 loops, best of 3: 2.63 usec per loop
% python -mtimeit -s'import test' "test.using_lookahead('bracket')"
100000 loops, best of 3: 2.87 usec per loop

For long strings, regex is clearly faster: 对于长字符串，正则表达式显然更快：

when the match comes late: 比赛来晚了：

 % python -mtimeit -s'import test' "test.using_set('o'*1000+'bracket')" 10000 loops, best of 3: 49.7 usec per loop % python -mtimeit -s'import test' "test.using_lookahead('o'*1000+'bracket')" 100000 loops, best of 3: 6.66 usec per loop

when the match comes early: 比赛来得早：

 % python -mtimeit -s'import test' "test.using_set('bracket'+'o'*1000)" 10000 loops, best of 3: 50 usec per loop % python -mtimeit -s'import test' "test.using_lookahead('bracket'+'o'*1000)" 100000 loops, best of 3: 13.9 usec per loop

(To answer a question in the comments:) r'^(?=.*a)(?=.*b)(?=.*c)' can be used to signal a match: （要回答评论中的问题:) r'^(?=.*a)(?=.*b)(?=.*c)'可用于表示匹配：

In [40]: pat=re.compile(r'^(?=.*a)(?=.*b)(?=.*c)')

In [41]: pat.search('bracket')
Out[41]: <_sre.SRE_Match object at 0x9f9a6b0>

匹配正则表达式中任何顺序的所有字符

问题描述

3 个解决方案

解决方案1
10 已采纳 2011-11-14 14:39:19

解决方案2
9 2011-11-14 14:40:53

解决方案3
2 2011-11-14 14:59:45

匹配正则表达式中任何顺序的所有字符

问题描述

3 个解决方案

解决方案1 10 已采纳 2011-11-14 14:39:19

解决方案2 9 2011-11-14 14:40:53

解决方案3 2 2011-11-14 14:59:45

解决方案1
10 已采纳 2011-11-14 14:39:19

解决方案2
9 2011-11-14 14:40:53

解决方案3
2 2011-11-14 14:59:45