简体   繁体   中英

JSOUP - Select only some text from html

I am trying to select some text from the HTML using Jsoup in Android.

My HTML code looks like that:

 <tr class="tip " data-original-title="">
                                <td>
                                    !!! NOT That !!!                                </td>
                                <td>
                                    A205                                </td>
                                <td>
                                    I want to get this                               </td>
                                <td>
                                    And this                                </td>
                                <td>
                                    !!! And not this !!!                              </td>
                                <td>
                                                                    </td>
                            </tr>

How can I do that? Thank you so much!

For example:

package ru.java.study;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Main {

    private static String htmlText =
            "<tr class=\"tip \" data-original-title=\"\">" +
                    "<td>!!! NOT That !!!</td>" +
                    "                                <td>" +
                    "                                    A205                                </td>" +
                    "                                <td>" +
                    "                                    I want to get this                               </td>" +
                    "                                <td>" +
                    "                                    And this                                </td>" +
                    "                                <td>" +
                    "                                    !!! And not this !!!                              </td>" +
                    "                                <td>" +
                    "                                                                    </td>" +
                    "                            </tr>";

    public static void main(String[] args) {
        Document document = Jsoup.parse("<table>"+htmlText); //Add <table>

        String first_TD = document.select("td").get(2).text();
        String second_TD = document.select("td").get(3).text();;
        System.out.println(first_TD);
        System.out.println(second_TD);
    }
}

You must be more specific in your selection. There should be id="..." or class="..." attributes in <table> tag to precisely identify the table that you need.

        // Don't forget about <table> tag
        String html = "<table>" +
                            "<tr class=\"tip \" data-original-title=\"\">" +
                                "<td>!!! NOT That !!!</td>" +
                                "<td>A205</td>" +
                                "<td>I want to get this</td>" +
                                "<td>And this</td>" +
                                "<td>!!! And not this !!!</td>" +
                                "<td></td>" +
                            "</tr>" +
                       "</table>";
        Document doc = Jsoup.parseBodyFragment(html);
        // You should use more specific selector.
        // For example if table tag looks like this: <table id="myID">...</table>
        // then selector should look like this "table#myID tr.tip > td"
        Elements cells = doc.select("tr.tip > td");
        String cell3content = cells.get(2).html(); // use .text() for content without html tags
        String cell4content = cells.get(3).html();

        System.out.println(cell3content);
        System.out.println(cell4content);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM