简体   繁体   中英

Getting Strings from HTML with JSOUP

I need help in getting strings from HTML with JSOUP.

The document is built like:

<body>
   <span class="a-touch">
      <div class"a-container">
         <div class"a-box">
            <div class="a-row a-spacing-small">
              <b>string1</b><br/>string2 97<br/>String3
              <br/>string4</>string5<br/>
          </div>

Now i need to get the strings. I googled but only was able to found examples for tables and so on.

The following code gets you the strings array which contains the text content of the a-row div, split by line breaks:

Document doc = Jsoup.parseBodyFragment(html);
Elements a_row_div = doc.select(".a-row");
String[] strings = Jsoup.clean(a_row_div.html(), "", Whitelist.none(), 
    new OutputSettings().prettyPrint(false)).split("\n");

The strings are all stored in TextNode s in JSoup.

Use (Node n : Element.childNodes() collection to iterate over all the nodes. The only nodes that are usually relevant are of type Element or TextNode. Use if (n instanceof TextNode) to test for and operate on all innerText, and if (n instanceof Element) to make a recursive call on all sub-elements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM