Jsoup.parse(String) - doesn't add \n

Question

I am using Jsoup 1.7.2.

When using the API Jsoup.parse(String) I see that the output Document object adds line breaks (text line breaks, \\n) in the parsed HTML.

For example: The input string is:

<html><body><p>aaa</p></body></html>

And the Document object has the following (when calling the toString() ):

<html>
 <head></head>
 <body>
  <p>aaa</p>
 </body>
</html>

I am interested in the <body> element. How to instruct Jsoup not to format the output with new lines? I am expecting the body part to be: <body><p>aaa</p></body> .

On the other hand when I have an HTML with line breaks, I want them to remain intact.

Answer 1

try to do this:

Document newDocument = Jsoup.parse(htmlString, StringUtils.EMPTY, Parser.htmlParser());
newDocument.outputSettings().escapeMode(EscapeMode.base);
/**
 * Need CharEncoding.US_ASCII and not UTF-8 so the special characters will be encoded properly,
 * but representation of such will change. For instance: &mdash; will be encoded as &#8212;
 */
newDocument.outputSettings().charset(CharEncoding.US_ASCII);
newDocument.outputSettings().prettyPrint(false); // this will make sure that it will not add line breaks

Answer 2

Try this one. Its working

    Document doc = Jsoup.parse(String);
    // This line will keep your Html in one line
    doc.outputSettings().prettyPrint(false);

    System.out.println(doc.html());

Jsoup.parse(String) - doesn't add \n

Question

2 answers

solution1
4 ACCPTED 2014-01-08 16:21:43

solution2
3 2014-01-21 08:02:25

Jsoup.parse(String) - doesn't add \n

Question

2 answers

solution1 4 ACCPTED 2014-01-08 16:21:43

solution2 3 2014-01-21 08:02:25

solution1
4 ACCPTED 2014-01-08 16:21:43

solution2
3 2014-01-21 08:02:25