简体   繁体   English

使用JSoup从HTML检索数据

[英]Retrieve data from Html with JSoup

I'm trying to retreive informantions from a website but the problem is that classes names are identical. 我正在尝试从网站检索信息,但问题是类名称相同。 This is the website structure. 这是网站结构。

<tr class="main_el">
<td class="key">KEY1</td>
<td class="val">VALUE1</td>
</tr>

<tr class="main_el">
<td class="key">KEY2</td>
<td class="val">VALUE2</td>
</tr>
...
<tr class="main_el">
<td class="key">KEY3</td>
<td class="val">VALUE3</td>
</tr>

I can't use this .get(i).getElementsByClass(); 我不能使用此.get(i).getElementsByClass(); because indexes are diffrent for each page. 因为每个页面的索引都不同。 Please help! 请帮忙!

EDIT I want to use KEY1 retrieve VALUE1 only and independently of other VALUES. 编辑我只想使用KEY1检索VALUE1,并且独立于其他VALUES。

Note VALUE1 could be at index 1 or 9 注意 VALUE1可能位于索引1或9

You can write simple function like that. 您可以像这样编写简单的函数。

public Map<String, String> parseHtml(String inputHtml) {
    Document.OutputSettings outputSettings = new Document.OutputSettings();
    outputSettings.syntax(Document.OutputSettings.Syntax.html);
    outputSettings.prettyPrint(false);

    Document htmlDoc = Jsoup.parse(inputHtml);

    //Creating map to save td <key,value>

    Map<String, String> textMap = new HashMap<>();

    Elements trElements = htmlDoc.select("tr.main_el");

    if (trElements.size() > 0) {

        for (Element trElement : trElements) {
            String key = null;
            String value = null;

            for (Element tdElement : trElement.children()) {
                if (tdElement.hasClass("key"))
                    key = tdElement.text();
                if (tdElement.hasClass("value"))
                    value = tdElement.text();
            }

            if (key != null && value != null)
                textMap.put(key, value);
        }


    }
    return textMap;
}

Then you can retrieve values from map by keys from your html. 然后,您可以通过html中的键从地图中检索值。

Thanks. 谢谢。

Maybe this works: 也许这可行:

select all <tr> elements
for each <tr>
  select <td> with class "key" from the <tr>
  if value of this element == "KEY1" then
    select <td> with class "key" from <tr>
    do whatever you want with this value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM