简体   繁体   中英

What is the difference between re.search and re.match?

What is the difference between the search() and match() functions in the Python re module ?

I've read the documentation ( current documentation ), but I never seem to remember it. I keep having to look it up and re-learn it. I'm hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I'll have a better place to return with my question and it will take less time to re-learn it.

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

Note: If you want to locate a match anywhere in string, use search() instead.

re.search searches the entire string, as the documentation says :

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

So if you need to match at the beginning of the string, or to match the entire string use match . It is faster. Otherwise use search .

The documentation has a specific section for match vs. search that also covers multiline strings:

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Note that match may differ from search even when using a regular expression beginning with '^' : '^' matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The “ match ” operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

search ⇒ find something anywhere in the string and return a match object.

match ⇒ find something at the beginning of the string and return a match object.

match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let's find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:

匹配与搜索正则表达式 speedtest 线图

As you can see, searching for the pattern 'python' is faster than matching the pattern '(.*?)python(.*?)' .

Python is smart. Avoid trying to be smarter.

re.search search es for the pattern throughout the string , whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.

You can refer the below example to understand the working of re.match and re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match will return none , but re.search will return abc .

The difference is, re.match() misleads anyone accustomed to Perl , grep , or sed regular expression matching, and re.search() does not. :-)

More soberly, As John D. Cook remarks , re.match() "behaves as if every pattern has ^ prepended." In other words, re.match('pattern') equals re.search('^pattern') . So it anchors a pattern's left side. But it also doesn't anchor a pattern's right side: that still requires a terminating $ .

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.

Much shorter:

  • search scans through the whole string.

  • match scans only the beginning of the string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

re.match attempts to match a pattern at the beginning of the string . re.search attempts to match the pattern throughout the string until it finds a match.

Quick answer

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM