简体   繁体   English

在 Java 中替换 ASCII 代码和 HTML 标签

[英]Replace ASCII codes and HTML tags in Java

How can i achieve below expecting results without using StringEscapeUtils ?如何在不使用StringEscapeUtils情况下实现低于预期的结果?

public class Main {
    public static void main(String[] args) throws Exception {
      String str = "<p><b>Send FWB <br><br> &#40;if AWB has COU SHC, <br> if ticked , will send FWB&#41;</b></p>";
      str = str.replaceAll("\\<.*?\\>", "");
      System.out.println("After removing HTML Tags: " + str);
    }
}

Current Results:当前结果:

After removing HTML Tags: Send FWB  &#40;if AWB has COU SHC,  if ticked , will send FWB&#41;

Expecting Results:预期结果:

After removing HTML Tags: Send FWB  if AWB has COU SHC,  if ticked , will send FWB;

Already checked: How to unescape HTML character entities in Java?已检查: 如何在 Java 中取消转义 HTML 字符实体?


PS: This is just a sample example, input may vary. PS:这只是一个示例,输入可能会有所不同。

Your regexp is for html tags <something> would be matched byt the html entities will not be matched.您的正则表达式用于 html 标签<something>将与 html 实体不匹配。 Their pattern is something like &.*?;他们的模式类似于&.*?; Which you are not replacing.你没有更换。

this should solve your trouble:这应该可以解决您的问题:

str = str.replaceAll("\\<.*?\\>|&.*?;", "");

If you want to experiment with this in a sandbox, try regxr.com and use (\\<.*?\\>)|(&.*?;) the brackets make the two different capturing groups easy to identify on the tool and are not needed in your code.如果您想在沙箱中对此进行试验,请尝试 regxr.com 并使用(\\<.*?\\>)|(&.*?;)括号使两个不同的捕获组易于在工具上识别,并且在您的代码中不需要。 note that the \\ does not need to be escaped on that sandbox playground, but it has to be in your code, since it's in a string.请注意, \\不需要在沙盒操场上转义,但它必须在您的代码中,因为它在一个字符串中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM