简体   繁体   English

Javascript 正则表达式:无法删除多行字符串中的前瞻组中的前导空格

[英]Javascript Regex: Unable to remove leading spaces in lookahead group in a multi line string

I am trying the regex ^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm to extract row items from single column tabular list format string.我正在尝试正则表达式^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm从单列表格列表格式字符串中提取行项目。 But the leading spaces are added in the match.但是在匹配中添加了前导空格。 The \s+ operators in the lookahead and lookbehind groups do not help.前瞻和后瞻组中的\s+运算符没有帮助。 Refer below:参考以下:

x = `namespace
-------------------
               itm1
     itm2
  itm3
               itm4
               
(4 rows)
`
console.log(x.match(/^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm)[0].split(/\s+/))

Output is with leading and trailing spaces as separate list elements: Output 具有前导和尾随空格作为单独的列表元素:

[ '', 'itm1', 'itm2', 'itm3', 'itm4', '' ]

But with console.log(x.match(/^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm)[0].trim().split(/\s+/)) <-- notice the trim() before the split(..) , the output is:但是使用console.log(x.match(/^(?<=[\s]*namespace[\s]*---+\s+)(.|\s)+(?=\(\s*\d+\s*rows\))/gm)[0].trim().split(/\s+/)) <-- 注意split(..)之前的trim() ,output 是:

[ 'itm1', 'itm2', 'itm3', 'itm4' ]

Why does the \s+ at the end of the lookahead group (?<=[\s]*namespace[\s]*---+\s+) not remove all the spaces before the desired matching group caught by (.|\s)+ .为什么前瞻组末尾的\s+ (?<=[\s]*namespace[\s]*---+\s+)不删除由(.|\s)+捕获的所需匹配组之前的所有空格(.|\s)+

Root cause根本原因

The regex engine parses the string from left to right.正则表达式引擎从左到右解析字符串。

The regex searches for the match at the start of string, and does not find the lookbehind pattern, it fails right there, and then the next position is tested, between n and a in namespace .正则表达式在字符串的开头搜索匹配项,但没有找到后向模式,它在那里失败,然后在namespace中的na之间测试下一个 position。 And so on until the newline after the ------------------- .依此类推,直到-------------------之后的换行符。

At the location right after the \n , the newline char, there is a lookbehind pattern match, \s+ at the end of your lookbehind finds a whitespace required by \s+ pattern.\n之后的位置,换行符,有一个lookbehind模式匹配, \s+在你的lookbehind末尾找到\s+模式所需的空格。 Then, the rest of the pattern finds a match, too.然后,模式的 rest 也找到匹配项。 Hence, there are 15 leading spaces in your result.因此,您的结果中有 15 个前导空格。

Solution解决方案

Use a consuming pattern.使用消费模式。 That is, use a capturing group.也就是说,使用捕获组。 Or, make sure your consuming part starts with a non-whitespace char.或者,确保您的消费部分以非空白字符开头。

Thus,因此,

 const x = "namespace\n-------------------\n itm1\n itm2\n itm3\n itm4\n \n(4 rows)\n"; console.log( x.match(/(?<=^\s*namespace\s*---+\s+)\S.*?(?=\s*\(\s*\d+\s*rows\))/gms)[0].split(/\s+/) );

Or, with a capturing group:或者,使用捕获组:

 const x = "namespace\n-------------------\n itm1\n itm2\n itm3\n itm4\n \n(4 rows)\n"; console.log( x.match(/^\s*namespace\s*---+\s+(\S.*?)(?=\s*\(\s*\d+\s*rows\))/ms)[1].split(/\s+/) );

Note on the regexps:注意正则表达式:

  • I replace (.|\s)+ with a mere .我将(.|\s)+替换为. pattern, but added the s flag so that .模式,但添加了s标志,以便. could match line break chars.可以匹配换行符。 Please never use (.|\s)* , (.|\n)* , or (.|[\r\n])* , these are very inefficient regex patterns请永远不要使用(.|\s)*(.|\n)*(.|[\r\n])* ,这些是非常低效的正则表达式模式
  • I added \s* at the start of the positive lookahead so that the trailing whitespaces could be stripped from the match.我在正向前瞻的开头添加了\s* ,以便可以从匹配中删除尾随空格。
  • I also use a lazy dot, .*?我还使用了一个懒惰的点, .*? , in both patterns to match the least amount of chars between two strings. , 在两种模式中以匹配两个字符串之间的最少字符数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM