简体   繁体   中英

What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java?

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

class JsoupTester {
        public static void main(String[] args) {
                String html = "<html><head><title>Sample Title</title></head><body><p>Sample Content</p></body></html>";
                Document document = Jsoup.parse(html);
                System.out.println(document.body());
                Document document2 = Jsoup.parseBodyFragment(html);
                System.out.println(document2.body());


        }
}

What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java as both give the same output?

output:

shiv@OS:~/CodeSnippets/TrashCode$ javac JsoupTester.java 
shiv@OS:~/CodeSnippets/TrashCode$ java JsoupTester
<body>
 <p>Sample Content</p>
</body>
<body>
 <title>Sample Title</title>
 <p>Sample Content</p>
</body>

As stated by the documentation , the parse method takes a full HTML document as input and the parseBodyFragment method takes an HTML body fragment as input.

You can test differences by running JShell :

jshell --class-path jsoup-1.13.1.jar

If you use an HTML document with a <title> in the <head> :

var html = "<html><head><title>aaa</title><body>bbb</body></html>"

Calling Jsoup.parse(html) gives:

 <html> <head> <title>aaa</title> </head> <body> bbb </body> </html>

and calling Jsoup.parseBodyFragment(html) gives:

 <html> <head></head> <body> <title>aaa</title>bbb </body> </html>

So the main difference is that Jsoup.parseBodyFragment will always give you an empty head and Jsoup.parse is able to parse the head section.

The website can be built using (as well) HTML fragments ie snippets of HTLM, that don't contain <html><body> tags.

Eg

<p id="1p">LOrem ipsum...</p>
<p id="2p">Ipsum LOrem ipsum...</p>

This is what parseBodyFragment() for: parsing HTML snippets which are not the complete HTML files.

When this method called Jsoup adds <html><body> to the beginning and </html></body> to the end of HTML fragment.

This method should not be used for full HTML documents, use parse() instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM