简体   繁体   中英

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.

Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.

So for capturing URL I formulated this 2 rules:

  • it should be after this string: window.location.redirect("
  • it should be before this string ")

To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).

I end up with this Regex:

.+(?<=window\\.location\\.redirect\\(\\"?=\\"\\))

It doesn't work. I'm not even sure that it legal to mix both rules like I did.

Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.

The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something . And it will only work in PCRE/Python if you remove the ? from before \\" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.

If it is JS, you just cannot use a lookbehind as its regex engine does not support them.

Instead, use a capturing group around the unknown part you want to get:

/window\.location\.redirect\("([^"]*)"\)/

or

/window\.location\.redirect\("(.*?)"\)/

See the regex demo

No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.

The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a " , you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ") .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM