简体   繁体   English

正面lookbehind vs匹配重置(\\ K)正则表达式功能

[英]Positive lookbehind vs match reset (\K) regex feature

I just learned about the apparently undocumented \\K behavior in Ruby regex (thanks to this answer by anubhava ). 我刚刚了解了Ruby正则表达式中显然没有文档的 \\K行为(感谢anubhava的 回答 )。 This feature (possibly named Keep ?) also exists in PHP, Perl, and Python regex flavors. 此功能(可能名为Keep ?)也存在于PHP,Perl和Python正则表达式中。 It is described elsewhere as " drops what was matched so far from the match to be returned ." 它在其他地方被描述为“ 从匹配中丢弃到目前为止匹配的内容 ”。

"abc".match(/ab\Kc/)     # matches "c"

Is this behavior identical to the positive lookbehind marker as used below? 这种行为是否与下面使用的正向外观标记相同?

"abc".match(/(?<=ab)c/)  # matches "c"

If not, what differences do the two exhibit? 如果没有,两者有什么不同之处?

It's easier to see the difference between \\K and (?<=...) with the String#scan method. 使用String#scan方法可以更容易地看到\\K(?<=...)之间的区别。

A lookbehind is a zero-width assertion that doesn't consume characters and that is tested (backwards) from the current position: lookbehind是一个零宽度断言,它不消耗字符,并且从当前位置进行测试(向后):

> "abcdefg".scan(/(?<=.)./)
=> ["b", "c", "d", "e", "f", "g"]

The "keep" feature \\K (that isn't an anchor) defines a position in the pattern where all that was matched so far by the pattern on the left is removed from the match result. “保持”特征\\K (不是锚)定义了模式中的位置,其中到目前为止所有匹配的位置都从匹配结果中移除。 But all characters matched before the \\K are consumed , they just don't appear in the result: 但前匹配所有字符\\K 消耗 ,他们只是不会出现在结果中:

> "abcdefg".scan(/.\K./)
=> ["b", "d", "f"]

The behaviour is the same as without \\K : 行为与没有\\K的行为相同:

> "abcdefg".scan(/../)
=> ["ab", "cd", "ef"]

except that the characters before the \\K are removed from the result. 除了从结果中删除\\K之前的字符。

One interesting use of \\K is to emulate a variable-length lookbehind, which is not allowed in Ruby (the same for PHP and Perl) , or to avoid the creation of a unique capture group. \\K一个有趣用途是模拟可变长度的lookbehind,这在Ruby中是不允许的(对于PHP和Perl也是如此) ,或者避免创建唯一的捕获组。 For example (?<=a.*)f. 例如(?<=a.*)f. can be implemented using \\K : 可以使用\\K实现:

> "abcdefg".match(/a.*\Kf./)
=> #<MatchData "fg">

An alternative way would be to write /a.*(f.)/ , but the \\K avoids the need to create a capture group. 另一种方法是编写/a.*(f.)/ ,但是\\K避免了创建捕获组的需要。

Note that the \\K feature also exists in the python regex module, even this one allows variable-length lookbehinds. 请注意, \\K feature还存在于python regex模块中,即使这个也允许可变长度的lookbehinds。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM