简体   繁体   中英

java replace between two tags html with case sensitive

I have this Regex code in java that removes <style> tag from a string:

  String questionDroz  = "TEST0 <style>TESTE1</style> <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>";
  System.out.println(questionDroz.replaceAll("(?s)<style>.*?</style>", ""));

Output

TEST0  <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>

I want to remove also style with atributes, can be any atribute in tag, and dont have case sensitive.

the result must be only:

TEST0

Also if possible add <script> in this regex, or I can do this separately in another regex no problem.

If you do not consider using a HTML parser as an option, or this is a one-off job involving the HTML content that you have control of, you can use either of

String regex = "(?si)\\s*<style(?:\\s[^>]*)?>.*?</style>";
String regex = "(?i)\\s*<style(?:\\s[^>]*)?>[^<]*(?:<(?!/style>)[^<]*)*</style>";

See the regex demo #1 and regex demo #2 . Note the second one is more efficient and should be preferred with long inputs.

Details

  • (?si) - Pattern.DOTALL ( s ) and Pattern.CASE_INSENSITIVE embedded flag options
  • \s* - zero or more whitespaces
  • <style - literal text
  • (?:\s[^>]*)? - an optional sequence of a whitespace and then any zero or more chars other than >
  • > - a > char
  • .*? - any zero or more chars, as few as possible
  • [^<]*(?:<(?!/style>)[^<]*)* - any zero or more chars other than < and then any zero or more repetitions of a < not followed with /style> and then any zero or more chars other than <
  • </style> - a literal text.

See a Java demo :

String questionDroz  = "TEST0 <style>TESTE1</style> <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>";
String regex = "(?si)<style(?:\\s[^>]*)?>.*?</style>";
System.out.println(questionDroz.replaceAll(regex, "").trim());
// => TEST0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM