[英]Jsoup.parse(String) - doesn't add \n
I am using Jsoup 1.7.2. 我正在使用Jsoup 1.7.2。
When using the API Jsoup.parse(String)
I see that the output Document
object adds line breaks (text line breaks, \\n) in the parsed HTML. 当使用API Jsoup.parse(String)
我看到输出Document
对象在解析的HTML中添加了换行符(文本换行符,\\ n)。
For example: The input string is: 例如:输入字符串是:
<html><body><p>aaa</p></body></html>
And the Document
object has the following (when calling the toString()
): Document
对象具有以下内容(当调用toString()
):
<html>
<head></head>
<body>
<p>aaa</p>
</body>
</html>
I am interested in the <body>
element. 我对<body>
元素感兴趣。 How to instruct Jsoup not to format the output with new lines? 如何指示Jsoup不要用新行格式化输出? I am expecting the body part to be: <body><p>aaa</p></body>
. 我期待身体部分是: <body><p>aaa</p></body>
。
On the other hand when I have an HTML with line breaks, I want them to remain intact. 另一方面,当我有一个带换行符的HTML时,我希望它们保持不变。
try to do this: 试着这样做:
Document newDocument = Jsoup.parse(htmlString, StringUtils.EMPTY, Parser.htmlParser());
newDocument.outputSettings().escapeMode(EscapeMode.base);
/**
* Need CharEncoding.US_ASCII and not UTF-8 so the special characters will be encoded properly,
* but representation of such will change. For instance: — will be encoded as —
*/
newDocument.outputSettings().charset(CharEncoding.US_ASCII);
newDocument.outputSettings().prettyPrint(false); // this will make sure that it will not add line breaks
Try this one. 试试这个吧。 Its working 它的工作
Document doc = Jsoup.parse(String);
// This line will keep your Html in one line
doc.outputSettings().prettyPrint(false);
System.out.println(doc.html());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.