jsoup解析html標簽屬性

Question

例如：

<html>
   <head></head>
   <body sometag='"'></body>
</html>

當我使用Jsoup解析此類html時：

Document doc = Jsoup.parse(html);
doc.outputSettings().prettyPrint(false);
System.out.println(doc.toString());

它將成為

<html>
   <head></head>
   <body sometag="&quot;"></body>
</html>

注意'和'，我不希望它解析'和'，我只需要它來獲取一些文本，有什么方法可以避免jsoup解析它。 非常感謝

Answer 1

只是不要使用HTML解析器。 請改用XML解析器。

Document doc = Jsoup.parse(html, "", Parser.xmlParser());

Answer 2

因此，我在使用不同的字符串轉義時做了一些嘗試，而實現此目的的最簡單方法是執行以下操作：

盡管這可能不是您追求的目標，但我們會看到的。

String html = "<html> <head> </head> <body sometag='\"'> </body> </html>";

Document doc = Jsoup.parse(html);
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
System.out.println( StringEscapeUtils.unescapeXml( doc.toString() ) );

jsoup解析html標簽屬性

問題描述

2 個解決方案

解決方案1
0 2018-02-08 06:20:19

解決方案2
0 2018-02-08 11:25:47

jsoup解析html標簽屬性

問題描述

2 個解決方案

解決方案1 0 2018-02-08 06:20:19

解決方案2 0 2018-02-08 11:25:47

解決方案1
0 2018-02-08 06:20:19

解決方案2
0 2018-02-08 11:25:47