简体   繁体   中英

How to match exact “multiple” strings in Python

I've got a list of exact patterns that I want to search in a given string. Currently I've got a real bad solution for such a problem.

pat1 = re.compile('foo.tralingString')
mat1 = pat1.match(mystring)

pat2 = re.compile('bar.trailingString')
mat2 = pat2.match(mystring)

if mat1 or mat2:
    # Do whatever

pat = re.compile('[foo|bar].tralingString')
match = pat.match(mystring) # Doesn't work

The only condition is that I've got a list of strings which are to be matched exactly. Whats the best possible solution in Python.

EDIT: The search patterns have some trailing patterns common.

You could do a trivial regex that combines those two:

pat = re.compile('foo|bar')
if pat.match(mystring):
    # Do whatever

You could then expand the regex to do whatever you need to, using the | separator (which means or in regex syntax)

Edit: Based upon your recent edit, this should do it for you:

pat = re.compile('(foo|bar)\\.trailingString');
if pat.match(mystring):
    # Do Whatever

The [] is a character class. So your [foo|bar] would match a string with one of the included characters (since there's no * or + or ? after the class). () is the enclosure for a sub-pattern.

You're right in using | but you're using a character class [] instead of a subpattern () . Try this regex:

r = re.compile('(?:foo|bar)\.trailingString')

if r.match(mystring):
    # Do stuff

Old answer

If you want to do exact substring matches you shouldn't use regex.

Try using in instead:

words = ['foo', 'bar']

# mystring contains at least one of the words
if any(i in mystring for i in words):
    # Do stuff

Use '|' in your regex. It stands for 'OR'. There is better way too, when you want to re.escape your strings

pat = re.compile('|'.join(map(re.escape, ['foo.tralingString','bar.tralingString','something.else'])))

Do you want to search for patterns or strings ? The best solution for each is very different:

# strings
patterns = ['foo', 'bar', 'baz']
matches = set(patterns)

if mystring in matches:     # O(1) - very fast
    # do whatever


# patterns
import re
patterns = ['foo', 'bar']
matches = [re.compile(pat) for pat in patterns]

if any(m.match(mystring) for m in matches):    # O(n)
    # do whatever

Edit: Ok, you want to search on variable-length exact strings at the beginning of a search string; try

from collections import defaultdict
matches = defaultdict(set)

patterns = ['foo', 'barr', 'bazzz']
for p in patterns:
    matches[len(p)].add(p)

for strlen,pats in matches.iteritems():
    if mystring[:strlen] in pats:
        # do whatever
        break

perhaps

any([re.match(r, mystring) for r in ['bar', 'foo']])

I'm assuming your match patterns will be more complex than foo or bar; if they aren't, just use

if mystring in ['bar', 'foo']:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM