[英]perl non-greedy regex case matching too much
I have a file
with something like 我有一个类似的
file
<post href="http://example.com/" description="Example website" tag="more text"/>
What I want to get is Example website
. 我想得到的是
Example website
。 Doing: 这样做:
cat file | perl -pe 's/.*description=".*?"//'
works as expected, and I get tag="more text"/>
, but when trying: 按预期工作,我得到
tag="more text"/>
,但在尝试时:
cat file | perl -pe 's/.*description="(.*)?"/\1/'
I get Example website" tag="more text/>
, while I was expecting to get Example website
. 我得到
Example website" tag="more text/>
,而我期待得到Example website
。 So it seems there's something with the capturing and the backreference that is not working as intended, and although I think I might understand why, I'm not sure how to solve it. 所以看起来捕捉和后向引用的某些东西没有按预期工作,虽然我想我可能理解为什么,但我不确定如何解决它。
I could always do: 我总能这样做:
cat file | perl -pe 's/.*description="//;s/".*//'
but I really want to understand how to solve it with the regular expression, instead of doing two substitutions. 但我真的想了解如何用正则表达式解决它,而不是做两次替换。
You aren't using non-greedy, you have greediness within an optional capture group as the question mark is right after the group's end parenthesis: 你没有使用非贪婪的,你在可选的捕获组中有贪婪,因为问号就在组的末端括号后面:
Change: 更改:
description="(.*)?"
to: 至:
description="(.*?)"
and you should have your expected results. 你应该得到预期的结果。
The ?
的
?
metacharacter has two meanings in regular expressions. 元字符在正则表达式中有两个含义。
When it follows a character like *
or +
which allows an expression to be matched a variable number of times, it is the "non-greedy" modifier. 当它跟随像
*
或+
这样的字符允许表达式匹配可变次数时,它就是“非贪婪”修饰符。
.*?
a+?
(foo){3,}? # actually, I'm not sure about this one
it In other contexts, it means "match 0 or 1 times" 在其他情况下,它意味着“匹配0或1次”
abc?d # matches "abcd" or "abd"
By putting the ?
通过把
?
outside the capture group, you have changed it to the second meaning. 在捕获组之外,您已将其更改为第二个含义。 Put it inside the capture group, like @smerny said.
把它放在捕获组中,就像@smerny说的那样。
(.*?)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.