I'm trying to use a Regex expression I've found in this website and it doesn't seem to work. Any ideas?
Input string :
sFetch = "123<script type=\"text/javascript\">\n\t\tfunction utmx_section(){}function utmx(){}\n\t\t(function()})();\n\t</script>456";
Regex :
sFetch = Regex.Replace(sFetch, "<script.*?>.*?</script>", "", RegexOptions.IgnoreCase);
Add RegexOptions.Singleline
RegexOptions.IgnoreCase | RegexOptions.Singleline
And that will never work on follow one.
<script
>
alert(1)
</script
/**/
>
So, Find a HTML parser like HTML Agility Pack
The reason the regex fails is that your input has newlines
and the meta char .
does not match it.
To solve this you can use the RegexOptions.Singleline
option as S.Mark says, or you can change the regex to:
"<script[\d\D]*?>[\d\D]*?</script>"
which used [\\d\\D]
instead of .
.
\\d
is any digit and \\D
is any non-digit, so [\\d\\D]
is a digit or a non-digit which is effectively any char.
If you actually want to sanitize a html string (and you're using .NET) then take a look at the Microsoft Web Protection Library :
Sanitizer.GetSafeHtmlFragment(untrustedHtml);
There's a description here .
This is a bit shorter:
"<script[^<]*</script>"
or
"<[^>]*>[^>]*>"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.