使用jsoup从body标签中提取innerHtml

Question

I am parsing html using jsoup and want to extract innerHtml inside of body tag 我正在使用jsoup解析html，并想在body标签内提取innerHtml

so far I tried and use document.body.childern().outerHtml; 到目前为止，我尝试并使用document.body.childern（）。outerHtml; but its giving only html element and skipping floating text(not wrapped within any html tag) inside of body 但它只给出html元素，并在体内跳过浮动文本（未包装在任何html标签中）

private String getBodyTag(final Document document) {
        return document.body().children().outerHtml();
}

Input: 输入：

<!DOCTYPE html>
<html lang="de">
    <head>
        <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <link rel="stylesheet" type="text/css" href="assets/style.css">
    </head>
    <body>
       <div>questions to improve formatting and clarity.</div>
       <h3>Guided Mode</h3> 
       some sample raw/floating text
    </body>
</html>

Expected: 预期：

<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3> 
some sample raw/floating text

Actual: 实际：

<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>

Answer 1

Please use this: 请使用此：

private String getBodyTag(final Document document) {
    return document.body().html();
}

Answer 2

You could try returning document.body.innerHtml; 您可以尝试返回document.body.innerHtml; instead, so it would return everything inside the body tag, including the text outside any tag. 相反，它将返回body标记内的所有内容，包括任何标记外的文本。

As far as I know, the way you are trying to accomplish it is not working because the "raw text" is not considered a child. 据我所知，您尝试完成此操作的方式无效，因为“原始文本”不被视为儿童。

使用jsoup从body标签中提取innerHtml

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-05-13 12:34:25

解决方案2
0 2019-05-13 12:07:28

使用jsoup从body标签中提取innerHtml

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-05-13 12:34:25

解决方案2 0 2019-05-13 12:07:28

解决方案1
3 已采纳 2019-05-13 12:34:25

解决方案2
0 2019-05-13 12:07:28