[英]Replace ASCII codes and HTML tags in Java
How can i achieve below expecting results without using StringEscapeUtils
?如何在不使用
StringEscapeUtils
情况下实现低于预期的结果?
public class Main {
public static void main(String[] args) throws Exception {
String str = "<p><b>Send FWB <br><br> (if AWB has COU SHC, <br> if ticked , will send FWB)</b></p>";
str = str.replaceAll("\\<.*?\\>", "");
System.out.println("After removing HTML Tags: " + str);
}
}
Current Results:当前结果:
After removing HTML Tags: Send FWB (if AWB has COU SHC, if ticked , will send FWB)
Expecting Results:预期结果:
After removing HTML Tags: Send FWB if AWB has COU SHC, if ticked , will send FWB;
Already checked: How to unescape HTML character entities in Java?已检查: 如何在 Java 中取消转义 HTML 字符实体?
PS: This is just a sample example, input may vary. PS:这只是一个示例,输入可能会有所不同。
Your regexp is for html tags <something>
would be matched byt the html entities will not be matched.您的正则表达式用于 html 标签
<something>
将与 html 实体不匹配。 Their pattern is something like &.*?;
他们的模式类似于
&.*?;
Which you are not replacing.你没有更换。
this should solve your trouble:这应该可以解决您的问题:
str = str.replaceAll("\\<.*?\\>|&.*?;", "");
If you want to experiment with this in a sandbox, try regxr.com and use (\\<.*?\\>)|(&.*?;)
the brackets make the two different capturing groups easy to identify on the tool and are not needed in your code.如果您想在沙箱中对此进行试验,请尝试 regxr.com 并使用
(\\<.*?\\>)|(&.*?;)
括号使两个不同的捕获组易于在工具上识别,并且在您的代码中不需要。 note that the \\
does not need to be escaped on that sandbox playground, but it has to be in your code, since it's in a string.请注意,
\\
不需要在沙盒操场上转义,但它必须在您的代码中,因为它在一个字符串中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.