简体   繁体   中英

Looking for difference between re.match(pattern, …) and re.search(r'\A' + pattern, …)

(All the code below assumes a context where import re has already been evaluated.)

The documentation on the differences between re.match and re.search specifically compares running re.match(pattern, ...) with running re.search('^' + pattern, ...) . This seems to me a bit of a strawman, because the real test would be to compare re.match(pattern, ...) with re.search(r'\\A' + pattern, ...) 1 .

To be more specific, I for one can't readily come up with a combination of pattern and string for which the outcome of

m = re.match(pattern, string)

will differ from the outcome of

m = re.search(r'\A' + pattern, string)

(Note that if the original pattern in pattern happens to be of type unicode , so is the revised pattern in r'\\A' + pattern , conveniently enough.)

Let me emphasize that here I'm not interested in possible differences in performance, convenience, etc. At the moment I'm interested only in differences in the final outcomes (ie differences in the final values of m ).

To phrase the question somewhat more generally, I'm looking for a combination of pattern , flags , string , and kwargs such that the final value of m in

r0 = re.compile(pattern, flags=flags)
m = r0.match(string, **kwargs)

differs from the final value of m in

r1 = re.compile(r'\A' + pattern, flags=flags)
m = r1.search(string, **kwargs)

It may be that no such combination of the inputs pattern , flags , string , and kwargs exists, but to be able to make this assertion with any confidence would require an in-depth knowledge of the internals of Python's regex engine. IOW, in contrast to a "positive answer" (ie one consisting of just one combination of inputs as described), a "negative answer" to this question amounts to a rather authoritative statement, so for it to be convincing requires that the case be made at a much deeper level (than for a "positive" answer).

To sum up: I'm looking answers of one of two possible kinds:

  1. A combination of pattern , flags , string , and kwargs that will produce different values of m in the last two cases given above;
  2. An authoritative "negative" answer (ie no such combination of inputs exists), based on knowledge of the internals of Python regular expressions.

1 \\A anchors the matching to the beginning of the string, irrespective of whether the matching is multiline or not. BTW, the counterpart of \\A for end-of-string matching is \\Z . Annoyingly enough, Python's \\Z corresponds to Perl's \\z , and not to Perl's \\Z . This tripped me when I wrote an earlier version of this post. (BTW, in Python regexes \\z has no special meaning; it just matches z .) Thanks to John Y for spotting my error.

There might be something I am not seeing here, but I think the difference is clear.

  1. re.match() returns a successful match only if the pattern you are looking for is at the start of the string, and from the look of the examples in the documentation it seems that re.match() uses \\A to anchor the match to the start of the string and not the start-of-line in multi-line mode.

  2. re.search() returns a successful match no matter where the pattern is inside the target string as long as there is a match, of course as long as you don't anchor the pattern intentionally.

Now answering your main question, about what is the difference between re.match(pattern, …) and re.search(r'\\A' + pattern, …) ?

Well there is no difference whatsoever, it is just a convenience method just so you don't have to type r'\\A' + pattern each time I guess if you want to anchor your match which happens a lot i suppose.

You can be more sure that re.match() uses \\A internally just by looking at the last example in the comparison link you posted:

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
<_sre.SRE_Match object at ...>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM