简体   繁体   中英

Jsoup HTML Parsing - Complex Nodes [Java]

I have this piece of HTML:

<td class="my class" >
      <div class="content" style="margin-left:10px;">
        <ul style="list-style-type: disc;">
           <li><span>obj: blue</span></li>
          <li><span>descr: red</span></li>
          <li><span>double: yellow</span></li>
        </ul>
      </div>
</td>

I need to have:

obj: blue

descr: red

double: yellow

I already tried:

docDescription.select("my.class").text();

But it returns the block, with all the text. I need 3 different parts (line by line).

Solution

docDescription.select("div > ul > li > span");

Explanation

Your document is invalid and look like below for JSoup. JSoup always tries to fix document. In your case td is outside of any table so it is removed.

<html>
 <head></head>
 <body> 
  <div class="content" style="margin-left:10px;"> 
   <ul style="list-style-type: disc;"> 
    <li><span>obj: blue</span></li> 
    <li><span>descr: red</span></li> 
    <li><span>double: yellow</span></li> 
   </ul> 
  </div> 
 </body>
</html>

Code

public static void main(String[] args) {
    String html = "<td class=\"my class\" >\n" +
            "      <div class=\"content\" style=\"margin-left:10px;\">\n" +
            "        <ul style=\"list-style-type: disc;\">\n" +
            "           <li><span>obj: blue</span></li>\n" +
            "          <li><span>descr: red</span></li>\n" +
            "          <li><span>double: yellow</span></li>\n" +
            "        </ul>\n" +
            "      </div>\n" +
            "</td>";

    Elements select = Jsoup.parse(html).select("div > ul > li > span");
    for (Element element : select) {
        System.out.println(element.text());
    }
}

Result

obj: blue
descr: red
double: yellow

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM