简体   繁体   English

使用JSOUP从HTML获取字符串

[英]Getting Strings from HTML with JSOUP

I need help in getting strings from HTML with JSOUP. 我需要使用JSOUP从HTML获取字符串的帮助。

The document is built like: 该文档的构建方式如下:

<body>
   <span class="a-touch">
      <div class"a-container">
         <div class"a-box">
            <div class="a-row a-spacing-small">
              <b>string1</b><br/>string2 97<br/>String3
              <br/>string4</>string5<br/>
          </div>

Now i need to get the strings. 现在我需要得到琴弦。 I googled but only was able to found examples for tables and so on. 我用谷歌搜索,但只能找到表格等的示例。

The following code gets you the strings array which contains the text content of the a-row div, split by line breaks: 以下代码为您提供一个strings数组,其中包含a-row div的文本内容,并按换行符分隔:

Document doc = Jsoup.parseBodyFragment(html);
Elements a_row_div = doc.select(".a-row");
String[] strings = Jsoup.clean(a_row_div.html(), "", Whitelist.none(), 
    new OutputSettings().prettyPrint(false)).split("\n");

The strings are all stored in TextNode s in JSoup. 这些字符串都存储在TextNode的TextNode中。

Use (Node n : Element.childNodes() collection to iterate over all the nodes. The only nodes that are usually relevant are of type Element or TextNode. Use if (n instanceof TextNode) to test for and operate on all innerText, and if (n instanceof Element) to make a recursive call on all sub-elements. 使用(Node n : Element.childNodes()收集到超过遍历所有的节点是通常相关的唯一节点类型元素或TextNode的使用。 if (n instanceof TextNode)来测试,并在所有的innerText运作, if (n instanceof Element)对所有子元素进行递归调用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM