简体   繁体   English

Java正则表达式-负数后置量词

[英]Java regex - quantifier in negative lookbehind

This regex question is kind of an extension of this question 这个正则表达式问题是该问题的扩展

Input 输入项

String input="first number <start number>123.45<end number> 
               and second number 678.90."

Desired output 所需的输出

String output="first number <start number>123.45<end number> 
               and second number <start number>678.90<end number>."

What I tried 我尝试了什么

I have a negative lookback for <number start> and a negative lookahead for <number end> : 我对<number start>的否定回溯和对<number end>的否定前瞻:

String regex="(?<!(<number start>))\\d+(\\.\\d+)?(?!(<number end>))
//             ^^^^^^^^^^^^^^^^^^^^              ^^^^^^^^^^^^^^^^^
//            negative lookback                    negative lookahead
//                                 ^^^^^^^^^^^^^
//                                  float match

But the problem is that for a String <number start>12.34<number end> it will match on 2.3 . 但是问题在于,对于字符串<number start>12.34<number end> ,它将在2.3匹配。

When I include quantifiers in the lookback I get an error 当我在回溯中包含量词时,出现错误

String regex="(?<!(<number start>\\d+))\\d+(\\.\\d+)?(?!(\\d+<number end>))
//             ^^^^^^^^^^^^^^^^^^^^^^^               ^^^^^^^^^^^^^^^^^
//            negative lookback                      negative lookahead
//                                     ^^^^^^^^^^^^^
//                                     float match

Thanks for the help! 谢谢您的帮助!

It's a limitation of the incredibly slow lookbehind feature. 这是令人难以置信的缓慢的后lookbehind功能的局限性。 For lookbehind, you cannot have an expression matching text of arbitrary length. 对于后向而言,您不能具有与任意长度的文本匹配的表达式。 Which is what the error message tells us. 错误消息告诉我们的是什么。

You could try something like this: 您可以尝试这样的事情:

(<start number>[-+]?\d*\.?\d+<end number>)|([-+]?\d*\.?\d+)
  • $1: Matches including the tags. $ 1:包含标签的匹配项。
  • $2: Matches excluding the tags. $ 2:不包含标签的匹配项。

Then replace text accordingly. 然后相应地替换文本。

Instead of including the \\d in the existing lookbehind, you can make a new one for it: 可以在其后的外观中添加\\d而不是添加\\d

(?<!<number start>|\d)\d+(?:\.\d+)?(?!\d|<number end>)

The pipe character ( | ) in the lookbehind / lookahead is a boolean "or". 后退/ lookahead中的竖线字符( | )是布尔值“或”。 This solution is similar to what you tried, but does not cause an exception because the lookbehind values have a fixed length. 此解决方案与您尝试的解决方案相似,但不会引起异常,因为后向值具有固定长度。

To explain it a little bit more in detail: Since the regex is supposed to match a decimal number, there must not be leading or trailing digits because they should be part of the match. 对其进行更详细的解释:由于正则表达式应该与十进制数匹配,因此不得包含前导或尾随数字,因为它们应该是匹配的一部分。 Therefore they are forbidden (using the negative lookbehind / lookahead) as well. 因此,它们也被禁止(使用负向后看/超前)。

Live demo: https://regex101.com/r/MdS7rF/1 现场演示: https : //regex101.com/r/MdS7rF/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM