简体   繁体   English

寻找re.match(pattern,...)和re.search(r'\\ A'+ pattern,...)之间的区别

[英]Looking for difference between re.match(pattern, …) and re.search(r'\A' + pattern, …)

(All the code below assumes a context where import re has already been evaluated.) (以下所有代码都假设已经评估了import re的上下文。)

The documentation on the differences between re.match and re.search specifically compares running re.match(pattern, ...) with running re.search('^' + pattern, ...) . 关于re.matchre.search之间差异的文档专门比较了运行re.match(pattern, ...)和运行re.search('^' + pattern, ...) This seems to me a bit of a strawman, because the real test would be to compare re.match(pattern, ...) with re.search(r'\\A' + pattern, ...) 1 . 这在我看来有点像一个稻草人,因为真正的测试是将re.match(pattern, ...)re.search(r'\\A' + pattern, ...) 1进行比较

To be more specific, I for one can't readily come up with a combination of pattern and string for which the outcome of 更具体地说,我不能随便想出一个patternstring的组合,其结果

m = re.match(pattern, string)

will differ from the outcome of 将与结果不同

m = re.search(r'\A' + pattern, string)

(Note that if the original pattern in pattern happens to be of type unicode , so is the revised pattern in r'\\A' + pattern , conveniently enough.) (注意,如果在原来的图案pattern恰好是类型的unicode ,所以是在订正图案r'\\A' + pattern ,方便地足够。)

Let me emphasize that here I'm not interested in possible differences in performance, convenience, etc. At the moment I'm interested only in differences in the final outcomes (ie differences in the final values of m ). 我要强调的是,在这里我对性能,便利性等方面的可能差异感兴趣。目前我只关注最终结果的差异(即m的最终值的差异)。

To phrase the question somewhat more generally, I'm looking for a combination of pattern , flags , string , and kwargs such that the final value of m in 为了更一般地说一下这个问题,我正在寻找patternflagsstringkwargs的组合,以便m的最终值

r0 = re.compile(pattern, flags=flags)
m = r0.match(string, **kwargs)

differs from the final value of m in m in的最终值不同

r1 = re.compile(r'\A' + pattern, flags=flags)
m = r1.search(string, **kwargs)

It may be that no such combination of the inputs pattern , flags , string , and kwargs exists, but to be able to make this assertion with any confidence would require an in-depth knowledge of the internals of Python's regex engine. 可能没有输入patternflagsstringkwargs这种组合,但是为了能够有信心地进行这种断言,需要深入了解Python的正则表达式引擎的内部结构。 IOW, in contrast to a "positive answer" (ie one consisting of just one combination of inputs as described), a "negative answer" to this question amounts to a rather authoritative statement, so for it to be convincing requires that the case be made at a much deeper level (than for a "positive" answer). IOW,与“肯定答案”(即仅由一个如上所述的输入组合组成的答案)形成对比,对这个问题的“否定答案”相当于一个相当权威的陈述,因此要令人信服,要求案例是在更深层次上(比“积极”答案)。

To sum up: I'm looking answers of one of two possible kinds: 总结一下:我正在寻找两种可能的答案之一:

  1. A combination of pattern , flags , string , and kwargs that will produce different values of m in the last two cases given above; patternflagsstringkwargs的组合,在上面给出的最后两种情况下将产生不同的m值;
  2. An authoritative "negative" answer (ie no such combination of inputs exists), based on knowledge of the internals of Python regular expressions. 基于Python正则表达式内部知识,权威的“否定”答案(即不存在这样的输入组合)。

1 \\A anchors the matching to the beginning of the string, irrespective of whether the matching is multiline or not. 1 \\A将匹配锚定到字符串的开头,而不管匹配是否为多行。 BTW, the counterpart of \\A for end-of-string matching is \\Z . 顺便说一下,字符串结尾匹配的\\A对应是\\Z Annoyingly enough, Python's \\Z corresponds to Perl's \\z , and not to Perl's \\Z . 令人讨厌的是,Python的\\Z对应于Perl的\\z ,而不是 Perl的\\Z This tripped me when I wrote an earlier version of this post. 当我写这篇文章的早期版本时,这让我感到沮丧。 (BTW, in Python regexes \\z has no special meaning; it just matches z .) Thanks to John Y for spotting my error. (顺便说一句,在Python regexes \\z中没有特殊含义;它只匹配z 。)感谢John Y发现我的错误。

There might be something I am not seeing here, but I think the difference is clear. 可能有一些我在这里看不到的东西,但我认为区别很明显。

  1. re.match() returns a successful match only if the pattern you are looking for is at the start of the string, and from the look of the examples in the documentation it seems that re.match() uses \\A to anchor the match to the start of the string and not the start-of-line in multi-line mode. re.match() 在您要查找的模式位于字符串的开头时返回成功匹配,并且从文档中的示例外观看起来re.match()使用\\A来锚定匹配项到字符串的开头而不是多行模式的行首。

  2. re.search() returns a successful match no matter where the pattern is inside the target string as long as there is a match, of course as long as you don't anchor the pattern intentionally. 只要有匹配, re.search()返回一个成功的匹配, 无论模式在目标字符串中的哪个位置,当然只要你没有故意锚定模式。

Now answering your main question, about what is the difference between re.match(pattern, …) and re.search(r'\\A' + pattern, …) ? 现在回答你的主要问题,关于re.match(pattern, …)re.search(r'\\A' + pattern, …)之间的区别是什么?

Well there is no difference whatsoever, it is just a convenience method just so you don't have to type r'\\A' + pattern each time I guess if you want to anchor your match which happens a lot i suppose. 好吧没有任何区别,它只是一个方便的方法,所以你不必每次输入r'\\A' + pattern我想如果你想锚定你的匹配,我认为发生了很多。

You can be more sure that re.match() uses \\A internally just by looking at the last example in the comparison link you posted: 您可以更加确定re.match() \\A内部使用\\A只需查看您发布的比较链接中的最后一个示例:

>>> re.match('X', 'A\nB\nX', re.MULTILINE)  # No match
>>> re.search('^X', 'A\nB\nX', re.MULTILINE)  # Match
<_sre.SRE_Match object at ...>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM