简体   繁体   中英

Javascript regex lookbehind: Invalid regexp group

I have the following little example with the regex /-+|(?<=: ?).* . But this leads to an infinite loop in Node/Chrome and an "Invalig regex group"-error in Firefox.

When i change this to /-+|(?<=: ).*/gm (Leaving out the?-quantifier in the lookbehind) it runs but - of course - i don't get the lines which contain no value after the : .

If i change the regex to /-+|(?<=:).*/gm (leaving the space out of the lookbehind) i again run into an infinite loop/error.

Can anyone explain this behaviour to me and what regex i would have to use to also match the lines which end on a colon? I'd love to understand...

const text = `
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
Prop Name: 5048603
Prop2 Name:
Bla bla bla: asjhgg | a3857
Location: Something...
-------------------------------------
`;

const pattern = /-+|(?<=: ?).*/gm;

let res;
while((res = pattern.exec(text)) !== null)
{
    console.log(`"${res[0]}"`);
} 

EDIT:

The expected output is:

"-------------------------------------"
"5048603"
""
"asjhgg | a3857"
"Something..."
"-------------------------------------"
"5048603"
""
"asjhgg | a3857"
"Something..."
"-------------------------------------"

The (?<=...) lookaround is a positive lookbehind and it is not yet supported in FireFox (see supported environments here ), thus, you will always get an exception until it is implemented.

The /-+|(?<=: ?).* pattern belongs to patterns that may match empty strings, and this is a very typical "pathological" type of patterns. The g flag makes the JS regex engine match all occurrences of the pattern, and to do that, it advances its lastIndex upon a valid match, but in cases when the match is of zero length, it does not, and keeps on trying the same regex at the same location all over again, and you end up in the loop. See here how to move the lastIndex properly to avoid infinite loops in these cases.

From what I see, you want to remove all beginning of lines before the first : including : and any whitespaces after. You may use

text.replace(/^[^:\r\n]+:[^\S\r\n]*/gm, '')

Or, if you want to actually extract those lines that are all - s or all after : , you may use

 const text = ` ------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... ------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... ------------------------------------- `; const pattern = /^-+$|:[^\S\r\n]*(.*)/gm; let res; while((res = pattern.exec(text)).== null) { if (res[1];= undefined) { console.log(res[1]); } else { console.log(res[0]); } }

try to use this pattern: /(.*):(.*)/mg

 const regex = /(.*):(.*)/mg; const str = `------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... ------------------------------------- Prop Name: 5048603 Prop2 Name: Bla bla bla: asjhgg | a3857 Location: Something... -------------------------------------`; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }

Up front: Wiktor's answer is the answer to make it work cross-browser.

For anyone who is interested in how to get this to work in Chrome with the "original" pattern (thanks to Wiktor's answer, pointing out that the last index is not incremented on zero-matching):

const pattern = /-+|(?<=: ?).*/gm;

let res;
while((res = pattern.exec(text)) !== null)
{
    if(res.index === pattern.lastIndex)
        pattern.lastIndex++;
    console.log(`"${res[0]}"`);
}

A Regex lookahead is defined like this (?=pattern) and not (pattern?)

https://www.regular-expressions.info/lookaround.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM