I have this Regex code in java that removes <style>
tag from a string:
String questionDroz = "TEST0 <style>TESTE1</style> <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>";
System.out.println(questionDroz.replaceAll("(?s)<style>.*?</style>", ""));
Output
TEST0 <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>
I want to remove also style with atributes, can be any atribute in tag, and dont have case sensitive.
the result must be only:
TEST0
Also if possible add <script>
in this regex, or I can do this separately in another regex no problem.
If you do not consider using a HTML parser as an option, or this is a one-off job involving the HTML content that you have control of, you can use either of
String regex = "(?si)\\s*<style(?:\\s[^>]*)?>.*?</style>";
String regex = "(?i)\\s*<style(?:\\s[^>]*)?>[^<]*(?:<(?!/style>)[^<]*)*</style>";
See the regex demo #1 and regex demo #2 . Note the second one is more efficient and should be preferred with long inputs.
Details
(?si)
- Pattern.DOTALL
( s
) and Pattern.CASE_INSENSITIVE
embedded flag options \s*
- zero or more whitespaces <style
- literal text (?:\s[^>]*)?
- an optional sequence of a whitespace and then any zero or more chars other than >
>
- a >
char .*?
- any zero or more chars, as few as possible [^<]*(?:<(?!/style>)[^<]*)*
- any zero or more chars other than <
and then any zero or more repetitions of a <
not followed with /style>
and then any zero or more chars other than <
</style>
- a literal text. See a Java demo :
String questionDroz = "TEST0 <style>TESTE1</style> <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>";
String regex = "(?si)<style(?:\\s[^>]*)?>.*?</style>";
System.out.println(questionDroz.replaceAll(regex, "").trim());
// => TEST0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.