简体   繁体   English

使用 JavaScript 在 Regex 上排除行首或行尾的匹配项

[英]Excluding matches on the beginning or end of a line on a Regex with JavaScript

I'm trying to define a regular expression in JavaScript that matches all ocurrences, excluding the ones on the beginning or on the end of a line.我试图在 JavaScript 中定义一个匹配所有出现的正则表达式,不包括行首或行尾的正则表达式。

I can exclude the ones on the beginning but not on the end.我可以排除开头的那些,但不能排除结尾的。 For example:例如:

const MULTILINE = `
Lorem ipsum dolor sit amet ANNA
ANNA lorem ipsum dolor sit amet
Lorem ipsum dolor ANNA sit amet
`

MULTILINE.match(/ANNA\w+/gm)
// output: ["ANNA_END", "ANNA_BEGIN", "ANNA_MIDDLE"] ok

MULTILINE.match(/(?!^)ANNA\w+/gm))
// output: ["ANNA_END", "ANNA_MIDDLE"] ok

MULTILINE.match(/ANNA\w+(?!$)/gm))
// output: ["ANNA_EN", "ANNA_BEGIN", "ANNA_MIDDLE"] fail
// expected: ["ANNA_BEGIN", "ANNA_MIDDLE"]

As seen, it correctly identifies my last string, but extracts the last character (as if $ was being replaced by another \\d expression).正如所见,它正确识别了我的最后一个字符串,但提取了最后一个字符(好像 $ 被另一个 \\d 表达式替换)。

I've read lots of documentation an tried several variations such as MULTILINE.match(/ANNA\\w+(?!ANNA\\w+$)/gm)) but without success.我已经阅读了大量文档并尝试了多种变体,例如MULTILINE.match(/ANNA\\w+(?!ANNA\\w+$)/gm))但没有成功。

Any help here?这里有什么帮助吗? :) :)

The ANN_END returns ANN_EN match because (?!$) lookahead, when failing, makes the engine backtrack, and as the pattern right before (?!$) is \\w+ , a + quantified pattern, the backtracking enables a match to complete before the end of string. ANN_END返回ANN_EN匹配,因为(?!$)前瞻,失败时,使引擎回溯,并且由于(?!$)之前的模式是\\w+ ,一个+量化模式,回溯使匹配能够在字符串的结尾。 See this demo and pay attention at the red arrow that show backtracking at Step 9:查看此演示并注意显示第 9 步回溯的红色箭头:

在此处输入图片说明

To disallow this partial word matching, you may add a word boundary, \\b , or another lookahead, (?!\\w) .要禁止这种部分单词匹配,您可以添加单词边界\\b或另一个前瞻(?!\\w)

The complete solution to match ANNA\\w+ not at the start/end of the string will look like匹配ANNA\\w+不在字符串的开头/结尾的完整解决方案将如下所示

/(?!^)\bANNA\w+\b(?!$)/gm

See the regex demo .请参阅正则表达式演示

Details细节

  • (?!^) - a negative lookahead that fails the match if the regex index is at the start of the string (?!^) - 如果正则表达式索引位于字符串的开头,则匹配失败的负前瞻
  • \\b - a word boundary \\b - 单词边界
  • ANNA - a substring ANNA - 一个子串
  • \\w+ - one or more word chars \\w+ - 一个或多个单词字符
  • \\b - a word boundary \\b - 单词边界
  • (?!$) - a negative lookahead that fails the match if the regex index is at the end of the string. (?!$) - 如果正则表达式索引位于字符串的末尾,则匹配失败的负前瞻。

JS demo: JS演示:

 const MULTILINE = `Lorem ipsum dolor sit amet ANNA_END ANNA_BEGIN lorem ipsum dolor sit amet Lorem ipsum dolor ANNA_MIDDLE sit amet`; console.log(MULTILINE.match(/(?!^)\\bANNA\\w+\\b(?!$)/gm));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM