I'm a regex newbie, but I understand how to match any characters in a regex query in order (ex. [abc] will match any of a, b or c. Also, I believe "abc" will match abc exactly).
However, how do I construct a regex query that will match all the characters abc in any order? So for example, I want it to match "cab" or "bracket". I'm using Python as my scripting language (not sure if this matters or not).
In Python, I wouldn't use a regualar expression for this purpose, but rather a set:
>>> chars = set("abc")
>>> chars.issubset("bracket")
True
>>> chars.issubset("fish")
False
>>> chars.issubset("bad")
False
Regular expressions are useful, but there are situations where different tools are more appropriate.
This can be done with lookahead assertions:
^(?=.*a)(?=.*b)(?=.*c)
matches if your string contains at least one occurrence of a
, b
and c
.
But as you can see, that's not really what regexes are good at.
I would have done:
if all(char in mystr for char in "abc"):
# do something
Checking for speed:
>>> timeit.timeit(stmt='chars.issubset("bracket");chars.issubset("notinhere")',
... setup='chars=set("abc")')
1.3560583674019995
>>> timeit.timeit(stmt='all(char in "bracket" for char in s);all(char in "notinhere" for char in s)',
... setup='s="abc"')
1.4581878714681409
>>> timeit.timeit(stmt='r.match("bracket"); r.match("notinhere")',
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)")')
1.0582279123082117
Hey, look, the regex wins! This even holds true for longer search strings:
>>> timeit.timeit(stmt='chars.issubset("bracketed");chars.issubset("notinhere")',
... setup='chars=set("abcde")')
1.4316702294817105
>>> timeit.timeit(stmt='all(char in "bracketed" for char in s);all(char in "notinhere" for char in s)',
... setup='s="abcde"')
1.6696223364866682
>>> timeit.timeit(stmt='r.match("bracketed"); r.match("notinhere")',
... setup='import re; r=re.compile("(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?:.*e)")')
1.1809254199004044
Here is a timeit comparison of issubset versus the regex solutions.
import re
def using_lookahead(text):
pat=re.compile(r'^(?=.*a)(?=.*b)(?=.*c)')
return pat.search(text)
def using_set(text):
chars=set('abc')
return chars.issubset(text)
For small strings, issubset
may be slightly faster:
% python -mtimeit -s'import test' "test.using_set('bracket')"
100000 loops, best of 3: 2.63 usec per loop
% python -mtimeit -s'import test' "test.using_lookahead('bracket')"
100000 loops, best of 3: 2.87 usec per loop
For long strings, regex is clearly faster:
when the match comes late:
% python -mtimeit -s'import test' "test.using_set('o'*1000+'bracket')" 10000 loops, best of 3: 49.7 usec per loop % python -mtimeit -s'import test' "test.using_lookahead('o'*1000+'bracket')" 100000 loops, best of 3: 6.66 usec per loop
when the match comes early:
% python -mtimeit -s'import test' "test.using_set('bracket'+'o'*1000)" 10000 loops, best of 3: 50 usec per loop % python -mtimeit -s'import test' "test.using_lookahead('bracket'+'o'*1000)" 100000 loops, best of 3: 13.9 usec per loop
(To answer a question in the comments:) r'^(?=.*a)(?=.*b)(?=.*c)'
can be used to signal a match:
In [40]: pat=re.compile(r'^(?=.*a)(?=.*b)(?=.*c)')
In [41]: pat.search('bracket')
Out[41]: <_sre.SRE_Match object at 0x9f9a6b0>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.