I have some HTML code with inline javascript in a <script>
tag that contains a regular expression removing superflous whitespace between a >
and a <
character as in
<script>
[...]
output = output.replace(/>\s*</g, '><');
[...]
</script>
This is invalid HTML (eg, according to PHPs DOMDocument->loadHTML()
), as the character sequence </
ends processing and is expected to be followed by the rest of the closing tag script>
.
I have tried to escape the <
as <
but then the expression doesn't match anymore (tested in jsfiddle ).
A workaround is to insert something in the regular expression that doesn't actually do anything but separates the <
from the /
, such as
output = output.replace(/>\s*[<]/g, '><');
This works and has the expected behavior, but looks like a terrible hack.
What is the right way to escape <
before /
in a js regular expression?
If PHP's DOMDocument->loadHTML()
thinks the script element ends there, I'm fairly sure it's a bug in DOMDocument->loadHTML()
. Script elements end with </script>
, and the content of script
elements is not HTML. script
elements have a much more...interesting...content model than that which the spec takes several paragraphs to explain.
Regarding issues with </
, the spec only mentions dealing with <!--
and </script>
, not </
in general.
But if you have to have inline script (you wouldn't have this problem if the code were in a .js
file), and you have to load it with something that apparently has a bug, your hack with the character class ( [<]
rather than <
) isn't bad at all. (I doubt performance is your concern, but if it were, I think we can probably say with a fair bit of assurance that the JavaScript engine's regular expression handler is going to be able to optimize that single-character character class away.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.