简体   繁体   English

java 在两个标签之间替换 html 区分大小写

[英]java replace between two tags html with case sensitive

I have this Regex code in java that removes <style> tag from a string:我在 java 中有这个正则表达式代码,它从字符串中删除<style>标记:

  String questionDroz  = "TEST0 <style>TESTE1</style> <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>";
  System.out.println(questionDroz.replaceAll("(?s)<style>.*?</style>", ""));

Output Output

TEST0  <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>

I want to remove also style with atributes, can be any atribute in tag, and dont have case sensitive.我还想删除带有属性的样式,可以是标签中的任何属性,并且不区分大小写。

the result must be only:结果必须是:

TEST0

Also if possible add <script> in this regex, or I can do this separately in another regex no problem.另外,如果可能的话,在这个正则表达式中添加<script> ,或者我可以在另一个正则表达式中单独执行此操作没问题。

If you do not consider using a HTML parser as an option, or this is a one-off job involving the HTML content that you have control of, you can use either of如果您不考虑使用 HTML 解析器作为选项,或者这是涉及您可以控制的 HTML 内容的一次性作业,您可以使用

String regex = "(?si)\\s*<style(?:\\s[^>]*)?>.*?</style>";
String regex = "(?i)\\s*<style(?:\\s[^>]*)?>[^<]*(?:<(?!/style>)[^<]*)*</style>";

See the regex demo #1 and regex demo #2 .请参阅正则表达式演示 #1正则表达式演示 #2 Note the second one is more efficient and should be preferred with long inputs.请注意,第二个更有效,应该首选长输入。

Details细节

  • (?si) - Pattern.DOTALL ( s ) and Pattern.CASE_INSENSITIVE embedded flag options (?si) - Pattern.DOTALL ( s ) 和Pattern.CASE_INSENSITIVE嵌入标志选项
  • \s* - zero or more whitespaces \s* - 零个或多个空格
  • <style - literal text <style - 文字文本
  • (?:\s[^>]*)? - an optional sequence of a whitespace and then any zero or more chars other than > - 一个可选的空格序列,然后是除>之外的任何零个或多个字符
  • > - a > char > - 一个>字符
  • .*? - any zero or more chars, as few as possible - 任何零个或多个字符,尽可能少
  • [^<]*(?:<(?!/style>)[^<]*)* - any zero or more chars other than < and then any zero or more repetitions of a < not followed with /style> and then any zero or more chars other than < [^<]*(?:<(?!/style>)[^<]*)* - 除了<之外的任何零个或多个字符,然后是任何零个或多个<不跟/style>的重复,然后除<以外的任何零个或多个字符
  • </style> - a literal text. </style> - 文字文本。

See a Java demo :请参阅Java 演示

String questionDroz  = "TEST0 <style>TESTE1</style> <style attr='attr1'>TEST2</style> <STYLE>TEST3</STYLE>";
String regex = "(?si)<style(?:\\s[^>]*)?>.*?</style>";
System.out.println(questionDroz.replaceAll(regex, "").trim());
// => TEST0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM