简体   繁体   English

java 中的 Jsoup.parse() 和 Jsoup.ParseBodyFragment() 有什么区别?

[英]What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java?

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

class JsoupTester {
        public static void main(String[] args) {
                String html = "<html><head><title>Sample Title</title></head><body><p>Sample Content</p></body></html>";
                Document document = Jsoup.parse(html);
                System.out.println(document.body());
                Document document2 = Jsoup.parseBodyFragment(html);
                System.out.println(document2.body());


        }
}

What is the difference between The Jsoup.parse() and Jsoup.ParseBodyFragment() in java as both give the same output? java 中的 Jsoup.parse() 和 Jsoup.ParseBodyFragment() 有什么区别,因为两者都给出相同的 output?

output: output:

shiv@OS:~/CodeSnippets/TrashCode$ javac JsoupTester.java 
shiv@OS:~/CodeSnippets/TrashCode$ java JsoupTester
<body>
 <p>Sample Content</p>
</body>
<body>
 <title>Sample Title</title>
 <p>Sample Content</p>
</body>

As stated by the documentation , the parse method takes a full HTML document as input and the parseBodyFragment method takes an HTML body fragment as input.文档所述, parse 方法将完整的 HTML 文档作为输入,而 parseBodyFragment 方法将 HTML 正文片段作为输入。

You can test differences by running JShell :您可以通过运行JShell来测试差异:

jshell --class-path jsoup-1.13.1.jar

If you use an HTML document with a <title> in the <head> :如果您在<head>中使用带有<title>的 HTML 文档:

var html = "<html><head><title>aaa</title><body>bbb</body></html>"

Calling Jsoup.parse(html) gives:调用Jsoup.parse(html)给出:

 <html> <head> <title>aaa</title> </head> <body> bbb </body> </html>

and calling Jsoup.parseBodyFragment(html) gives:并调用Jsoup.parseBodyFragment(html)给出:

 <html> <head></head> <body> <title>aaa</title>bbb </body> </html>

So the main difference is that Jsoup.parseBodyFragment will always give you an empty head and Jsoup.parse is able to parse the head section.所以主要区别在于Jsoup.parseBodyFragment总是会给你一个空头,而Jsoup.parse能够解析头部。

The website can be built using (as well) HTML fragments ie snippets of HTLM, that don't contain <html><body> tags.该网站可以使用(以及)HTML 片段(即不包含<html><body>标记的 HTLM 片段)来构建。

Eg例如

<p id="1p">LOrem ipsum...</p>
<p id="2p">Ipsum LOrem ipsum...</p>

This is what parseBodyFragment() for: parsing HTML snippets which are not the complete HTML files.这就是parseBodyFragment()的用途:解析 HTML 片段,这些片段不是完整的 HTML 文件。

When this method called Jsoup adds <html><body> to the beginning and </html></body> to the end of HTML fragment.当这个名为 Jsoup 的方法将<html><body>添加到 HTML 片段的开头并将</html></body>添加到结尾时。

This method should not be used for full HTML documents, use parse() instead.此方法不应用于完整的 HTML 文档,请改用parse()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM