简体   繁体   English

perl非贪婪的正则表达式匹配太多了

[英]perl non-greedy regex case matching too much

I have a file with something like 我有一个类似的file

<post href="http://example.com/" description="Example website" tag="more text"/>

What I want to get is Example website . 我想得到的是Example website Doing: 这样做:

cat file | perl -pe 's/.*description=".*?"//'

works as expected, and I get tag="more text"/> , but when trying: 按预期工作,我得到tag="more text"/> ,但在尝试时:

cat file | perl -pe 's/.*description="(.*)?"/\1/'

I get Example website" tag="more text/> , while I was expecting to get Example website . 我得到Example website" tag="more text/> ,而我期待得到Example website So it seems there's something with the capturing and the backreference that is not working as intended, and although I think I might understand why, I'm not sure how to solve it. 所以看起来捕捉和后向引用的某些东西没有按预期工作,虽然我想我可能理解为什么,但我不确定如何解决它。

I could always do: 我总能这样做:

cat file | perl -pe 's/.*description="//;s/".*//'

but I really want to understand how to solve it with the regular expression, instead of doing two substitutions. 但我真的想了解如何用正则表达式解决它,而不是做两次替换。

You aren't using non-greedy, you have greediness within an optional capture group as the question mark is right after the group's end parenthesis: 你没有使用非贪婪的,你在可选的捕获组中有贪婪,因为问号就在组的末端括号后面:

Change: 更改:

description="(.*)?"

to: 至:

description="(.*?)"

and you should have your expected results. 你应该得到预期的结果。

The ? ? metacharacter has two meanings in regular expressions. 元字符在正则表达式中有两个含义。

When it follows a character like * or + which allows an expression to be matched a variable number of times, it is the "non-greedy" modifier. 当它跟随像*+这样的字符允许表达式匹配可变次数时,它就是“非贪婪”修饰符。

.*?
a+?
(foo){3,}?               # actually, I'm not sure about this one

it In other contexts, it means "match 0 or 1 times" 在其他情况下,它意味着“匹配0或1次”

abc?d                    # matches "abcd" or "abd"

By putting the ? 通过把? outside the capture group, you have changed it to the second meaning. 在捕获组之外,您已将其更改为第二个含义。 Put it inside the capture group, like @smerny said. 把它放在捕获组中,就像@smerny说的那样。

(.*?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM