简体   繁体   English

Javascript 正则表达式后视:无效的正则表达式组

[英]Javascript regex lookbehind: Invalid regexp group

I have the following little example with the regex /-+|(?<=: ?).* .我有以下带有正则表达式/-+|(?<=: ?).*的小例子。 But this leads to an infinite loop in Node/Chrome and an "Invalig regex group"-error in Firefox.但这会导致 Node/Chrome 中的无限循环和 Firefox 中的“无效正则表达式组”错误。

When i change this to /-+|(?<=: ).*/gm (Leaving out the?-quantifier in the lookbehind) it runs but - of course - i don't get the lines which contain no value after the : .当我将其更改为/-+|(?<=: ).*/gm (在后面省略?-量词)它运行但 - 当然 - 我没有得到在之后不包含任何值的行: .

If i change the regex to /-+|(?<=:).*/gm (leaving the space out of the lookbehind) i again run into an infinite loop/error.如果我将正则表达式更改为/-+|(?<=:).*/gm (将空间留在后面)我再次遇到无限循环/错误。

Can anyone explain this behaviour to me and what regex i would have to use to also match the lines which end on a colon?谁能向我解释这种行为以及我必须使用什么正则表达式来匹配以冒号结尾的行? I'd love to understand...我很想明白...

const text = `
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
`;

const pattern = /-+|(?<=: ?).*/gm;

let res;
while((res = pattern.exec(text)) !== null)
{
    console.log(`"${res[0]}"`);
} 

EDIT:编辑:

The expected output is:预期的 output 为:

"-------------------------------------"
"5048603"
""
"asjhgg | a3857"
"Something..."
"-------------------------------------"
"5048603"
""
"asjhgg | a3857"
"Something..."
"-------------------------------------"

The (?<=...) lookaround is a positive lookbehind and it is not yet supported in FireFox (see supported environments here ), thus, you will always get an exception until it is implemented. (?<=...)环顾四周是一个积极的后视,它在 FireFox 中尚不支持(请参阅此处支持的环境),因此,在实施之前,您总是会遇到异常。

The /-+|(?<=: ?).* pattern belongs to patterns that may match empty strings, and this is a very typical "pathological" type of patterns. /-+|(?<=: ?).*模式属于可能匹配空字符串的模式,这是一种非常典型的“病态”类型的模式。 The g flag makes the JS regex engine match all occurrences of the pattern, and to do that, it advances its lastIndex upon a valid match, but in cases when the match is of zero length, it does not, and keeps on trying the same regex at the same location all over again, and you end up in the loop. g标志使 JS 正则表达式引擎匹配所有出现的模式,为此,它在有效匹配时推进其lastIndex ,但在匹配长度为零的情况下,它不会,并继续尝试相同正则表达式再次在同一个位置,你最终进入循环。 See here how to move the lastIndex properly to avoid infinite loops in these cases.请参阅此处如何正确移动lastIndex以避免在这些情况下出现无限循环。

From what I see, you want to remove all beginning of lines before the first : including : and any whitespaces after.据我所知,您想删除第一个:之前的所有行首,包括:以及之后的任何空格。 You may use您可以使用

text.replace(/^[^:\r\n]+:[^\S\r\n]*/gm, '')

Or, if you want to actually extract those lines that are all - s or all after : , you may use或者,如果您想实际提取所有- s 或所有:之后的行,您可以使用

 const text = ` ------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... ------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... ------------------------------------- `; const pattern = /^-+$|:[^\S\r\n]*(.*)/gm; let res; while((res = pattern.exec(text)).== null) { if (res[1];= undefined) { console.log(res[1]); } else { console.log(res[0]); } }

try to use this pattern: /(.*):(.*)/mg尝试使用这种模式:/(.*):(.*)/ /(.*):(.*)/mg

 const regex = /(.*):(.*)/mg; const str = `------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... ------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... -------------------------------------`; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }

Up front: Wiktor's answer is the answer to make it work cross-browser.预先:Wiktor 的答案是让它跨浏览器工作的答案。

For anyone who is interested in how to get this to work in Chrome with the "original" pattern (thanks to Wiktor's answer, pointing out that the last index is not incremented on zero-matching):对于任何对如何使用“原始”模式在 Chrome 中使用它感兴趣的人(感谢 Wiktor 的回答,指出最后一个索引在零匹配时不会增加):

const pattern = /-+|(?<=: ?).*/gm;

let res;
while((res = pattern.exec(text)) !== null)
{
    if(res.index === pattern.lastIndex)
        pattern.lastIndex++;
    console.log(`"${res[0]}"`);
}

A Regex lookahead is defined like this (?=pattern) and not (pattern?)正则表达式前瞻是这样定义的 (?=pattern) 而不是 (pattern?)

https://www.regular-expressions.info/lookaround.html https://www.regular-expressions.info/lookaround.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM