JSoup從標簽內部解析數據

Question

我正在解析我需要的大多數數據，除了其中一個數據，因為它包含在href標記中，並且我需要顯示在“ mmsi =“之后的數字

<a href="/showship.php?mmsi=235083844">Sunsail 4013</a>

我當前的解析器獲取我需要的所有其他數據，並且在下面。 我嘗試了一些操作，注釋掉的代碼偶爾返回未指定的條目。 有什么方法可以添加到下面的代碼中，以便在返回數據時，在名稱“ Sunsail 4013”之前返回數字“ 235083844”？

try {
        File input = new File("shipMove.txt");
        Document doc = Jsoup.parse(input, null);
        Elements tables = doc.select("table.shipInfo");
        for( Element element : tables )
        {
            Elements tdTags = element.select("td");
            //Elements mmsi = element.select("a[href*=/showship.php?mmsi=]");
            // Iterate over all 'td' tags found
            for( Element td : tdTags ){
                // Print it's text if not empty
                final String text = td.text();
                if( text.isEmpty() == false )
                {
                    System.out.println(td.text());
                }
            }
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

此處已解析數據和html文件的示例

Answer 1

您可以在Element對象上使用attr來檢索特定屬性的值
如果字符串模式一致，請使用substring獲取所需的值

碼

// Using just your anchor html tag
String html = "<a href=\"/showship.php?mmsi=235083844\">Sunsail 4013</a>";
Document doc = Jsoup.parse(html);

// Just selecting the anchor tag, for your implementation use a generic one
Element link = doc.select("a").first();

// Get the attribute value
String url = link.attr("href");

// Check for nulls here and take the substring from '=' onwards
String id = url.substring(url.indexOf('=') + 1);
System.out.println(id + " "+ link.text());

給人，

235083844 Sunsail 4013

代碼中for循環中的修改條件：

...
    for (Element td : tdTags) {
                // Print it's text if not empty
                final String text = td.text();
                if (text.isEmpty() == false) {
                    if (td.getElementsByTag("a").first() != null) {
                        // Get the attribute value
                        String url = td.getElementsByTag("a").first().attr("href");

                        // Check for nulls here and take the substring from '=' onwards
                        String id = url.substring(url.indexOf('=') + 1);
                        System.out.println(id + " "+ td.text());
                    }
                    else {
                        System.out.println(td.text());
                    }
                }
            }
...

上面的代碼將打印所需的輸出。

Answer 2

如果需要屬性值，則應使用attr()方法。

for( Element td : tdTags ){
    Elements aList = td.select("a");
    for(Element a : aList){
        String val = a.attr("href");
        if(StringUrils.isNotBlank(val)){
            String yourId = val.substring(val.indexOf("=") + 1);


        }
}

JSoup從標簽內部解析數據

問題描述

2 個解決方案

解決方案1
1 已采納 2014-04-16 09:34:00

解決方案2
0 2014-04-16 09:36:18

JSoup從標簽內部解析數據

問題描述

2 個解決方案

解決方案1 1 已采納 2014-04-16 09:34:00

解決方案2 0 2014-04-16 09:36:18

解決方案1
1 已采納 2014-04-16 09:34:00

解決方案2
0 2014-04-16 09:36:18