Can you explain me how this works? Here is an example:
<!-- The quick brown fox
jumps over the lazy dog -->
<!--[if IE 7]>
<link rel="stylesheet" type="text/css" href="/supersheet.css" />
<![endif]-->
<!-- Pack my box with five dozen liquor jugs -->
First, I tried to use the following regular expression to match the content inside conditional comments:
/<!--.*?stylesheet.*?-->/s
It failed, as the regular expression matches all the content before the first <!--
and the last -->
. Then I tried using another pattern with a lookahead assertion:
/<!--(?=.*?stylesheet).*?-->/s
It works and matches exactly what I need. However, the following regular expression works as well:
/<!--(?=.*stylesheet).*?-->/s
The last regular expression does not have a reluctant quantifier in the lookahead assertion. And now I am confused. Can anyone explain me how it works? Maybe there is a better solution for this example?
Updated:
I tried usig the regular expressions with lookahead assertion in another document, and it failed to mach the content between the comments. So, this one /<!--(?=.*?stylesheet).*?-->/s
(as well as this one /<!--(?=.*stylesheet).*?-->/s
) is not correct. Do not use it and try other suggestions.
Updated:
The solution has been found by Jonny 5 (see the answer). He suggested three options:
/style-sheet.css
, it will not work. \\K
. It works like a charm. The downsides are the following:
I think the following is a good solution for my example:
/(?s)<!--(?:(?!<!).)+?stylesheet.+?-->/
The same but with the s
modifier at the end:
/<!--(?:(?!<!).)+?stylesheet.+?-->/s
As I said, this is a good solution, but I managed to improve the pattern and found another one that in my case works faster.
So, the final solution is the following:
/<!--(?:(?!-->).)+?stylesheet.+?-->/s
Thanks all the participants for interesting answers.
The string stylesheet
is mentioned only one time in your test document, so both regular expressions you tried will match the same thing but in different ways.
<!--(?=.*?stylesheet).*?-->/s
This one does the following:
<!--
. stylesheet
. Fail if not found. -->
. <!--(?=.*stylesheet).*?-->/s
This one does the following:
<!--
. stylesheet
. Fail if not found. -->
. Basically, one needs to backtrack significantly while the other doesn't.
If your subject instead is...
<!-- The quick brown fox jumps over the lazy dog --> <!--[if IE 7]> <link rel="stylesheet" type="text/css" href="/supersheet.css" /> <![endif]--> <!-- Pack my box with five dozen stylesheets -->
you get two different results. The former would find the first stylesheet
, while the latter would find the second (and last) since it starts searching from the end of the string.
To match only the part <!--
... stylesheet
... -->
there are many ways:
1.) Use a negated hyphen [^-]
to limit the match and stay in between <!--
and stylesheet
(?s)<!--[^-]+stylesheet.+?-->
[^-]
allows only characters, that are not a hyphen. See test at regex101 .
2.) To get the "last" or closest match without much regex effort, also can put a greedy dot before to ᗧ eat up. Makes sense if not matching globally / only one item to match. Use \\K to reset after the greed:
(?s)^.*\K<!--.+?stylesheet.+?-->
See test at regex101 . Also can use a capture group and grab $1: (?s)^.*(<!--.+?stylesheet.+?-->)
3.) Using a lookahead to narrow it down is usually more costly:
(?s)<!--(?:(?!<!).)+?stylesheet.+?-->
See test at regex101 . (?!<!).
looks ahead at each character in between <!--
and stylesheet
if not starting another <!
... to stay inside one element. Similar to the negated hyphen solution.
Instead of .*
I used .+
for one or more - depends on what to be matched. Here +
fits better.
What solution to use depends on the exact requirements. For this case I would use the first.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.