在 Java 中替换 ASCII 代码和 HTML 标签

Question

How can i achieve below expecting results without using StringEscapeUtils ?如何在不使用StringEscapeUtils情况下实现低于预期的结果？

public class Main {
    public static void main(String[] args) throws Exception {
      String str = "<p><b>Send FWB <br><br> &#40;if AWB has COU SHC, <br> if ticked , will send FWB&#41;</b></p>";
      str = str.replaceAll("\\<.*?\\>", "");
      System.out.println("After removing HTML Tags: " + str);
    }
}

Current Results:当前结果：

After removing HTML Tags: Send FWB  &#40;if AWB has COU SHC,  if ticked , will send FWB&#41;

Expecting Results:预期结果：

After removing HTML Tags: Send FWB  if AWB has COU SHC,  if ticked , will send FWB;

Already checked: How to unescape HTML character entities in Java?已检查：如何在 Java 中取消转义 HTML 字符实体？

PS: This is just a sample example, input may vary. PS：这只是一个示例，输入可能会有所不同。

Answer 1

Your regexp is for html tags <something> would be matched byt the html entities will not be matched.您的正则表达式用于 html 标签<something>将与 html 实体不匹配。 Their pattern is something like &.*?;他们的模式类似于&.*?; Which you are not replacing.你没有更换。

this should solve your trouble:这应该可以解决您的问题：

str = str.replaceAll("\\<.*?\\>|&.*?;", "");

If you want to experiment with this in a sandbox, try regxr.com and use (\\<.*?\\>)|(&.*?;) the brackets make the two different capturing groups easy to identify on the tool and are not needed in your code.如果您想在沙箱中对此进行试验，请尝试 regxr.com 并使用(\\<.*?\\>)|(&.*?;)括号使两个不同的捕获组易于在工具上识别，并且在您的代码中不需要。 note that the \\ does not need to be escaped on that sandbox playground, but it has to be in your code, since it's in a string.请注意， \\不需要在沙盒操场上转义，但它必须在您的代码中，因为它在一个字符串中。

在 Java 中替换 ASCII 代码和 HTML 标签

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-26 07:22:55

在 Java 中替换 ASCII 代码和 HTML 标签

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-26 07:22:55

解决方案1
1 已采纳 2020-08-26 07:22:55