[英]How to remove HTML Entities in Jsoup?
How to remove HTML Entities using Jsoup? 如何使用Jsoup删除HTML实体? If I use Element.toString(), I get: 如果我使用Element.toString(),则会得到:
(...)
<td>Letter ó</td> //valid: <td>Letter ó</td>
(...)
I believe you can specify an encoding when you create a Jsoup Document something like this: 我相信您在创建Jsoup文档时可以指定一种编码,如下所示:
Document newDocument = Jsoup.parse(htmlString, StringUtils.EMPTY, Parser.htmlParser());
newDocument.outputSettings().escapeMode(EscapeMode.base);
newDocument.outputSettings().charset(CharEncoding.UTF_8);
This may be off-topic to the context of your question, but if you want to just decode HTML-entities without any other changes in the string (no tag processing, no comment stripping, etc) you can use org.jsoup.parser.Parser.unescapeEntities
, eg: 这可能与您的问题无关,但是,如果您只想解码HTML实体而无需在字符串中进行任何其他更改(不进行标签处理,不org.jsoup.parser.Parser.unescapeEntities
注释等),则可以使用org.jsoup.parser.Parser.unescapeEntities
,例如:
assert Parser.unescapeEntities("x ≈ <i>y</i>\n", true)
.equals("x ≈ <i>y</i>\n");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.