What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java?

Question

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

class JsoupTester {
        public static void main(String[] args) {
                String html = "<html><head><title>Sample Title</title></head><body><p>Sample Content</p></body></html>";
                Document document = Jsoup.parse(html);
                System.out.println(document.body());
                Document document2 = Jsoup.parseBodyFragment(html);
                System.out.println(document2.body());


        }
}

What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java as both give the same output?

output:

shiv@OS:~/CodeSnippets/TrashCode$ javac JsoupTester.java 
shiv@OS:~/CodeSnippets/TrashCode$ java JsoupTester
<body>
 <p>Sample Content</p>
</body>
<body>
 <title>Sample Title</title>
 <p>Sample Content</p>
</body>

Answer 1

As stated by the documentation , the parse method takes a full HTML document as input and the parseBodyFragment method takes an HTML body fragment as input.

You can test differences by running JShell :

jshell --class-path jsoup-1.13.1.jar

If you use an HTML document with a <title> in the <head> :

var html = "<html><head><title>aaa</title><body>bbb</body></html>"

Calling Jsoup.parse(html) gives:

 <html> <head> <title>aaa</title> </head> <body> bbb </body> </html>

and calling Jsoup.parseBodyFragment(html) gives:

 <html> <head></head> <body> <title>aaa</title>bbb </body> </html>

So the main difference is that Jsoup.parseBodyFragment will always give you an empty head and Jsoup.parse is able to parse the head section.

Answer 2

The website can be built using (as well) HTML fragments ie snippets of HTLM, that don't contain <html><body> tags.

Eg

<p id="1p">LOrem ipsum...</p>
<p id="2p">Ipsum LOrem ipsum...</p>

This is what parseBodyFragment() for: parsing HTML snippets which are not the complete HTML files.

When this method called Jsoup adds <html><body> to the beginning and </html></body> to the end of HTML fragment.

This method should not be used for full HTML documents, use parse() instead.

What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java?

Question

2 answers

solution1
0 2021-03-16 21:59:32

solution2
0 2022-09-01 18:00:32

What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java?

Question

2 answers

solution1 0 2021-03-16 21:59:32

solution2 0 2022-09-01 18:00:32

solution1
0 2021-03-16 21:59:32

solution2
0 2022-09-01 18:00:32