简体   繁体   English

使用jsoup选择一个非重要的div标签

[英]Select a non significant div tag using jsoup

I'm using jsoup for webscraping and have run into another issue. 我正在使用jsoup进行webscraping并遇到了另一个问题。 The div I need information from has no class, id or any special indication. 我需要信息的div没有类,id或任何特殊指示。 It's buried in the page. 它被埋在页面中。 Here it is: 这里是:

<div class="column">
    <div class="form-label">Rate: </div>
    <div>11.082/11.167</div>
    <div class="form-label padding-top">High/Low: </div>  
    <div>1005.0/0.0004</div>
</div>

I need to get the 1st set of numbers but I'm not sure how I can tell jsoup I want them specifically; 我需要获得第一组数字,但我不知道如何告诉jsoup我特别想要它们; does anyone have any advice? 有人有建议吗?

  1. Select all divs with class="column" 选择class="column"所有div
  2. Loop through your list of selected elements. 遍历所选元素列表。 Select the first div inside your element that has the text Rate: 选择元素中具有文本Rate:的第一个div Rate:
  3. your Text is inside the 2. div 你的文字在2. div内

Sorry Code formatting isnt working o.0 对不起代码格式化不起作用o.0

public String getRage(Document document) {
    for(Element e : document.getElementsByClass("column")) {
        if(e.getElementsByTagName("div").get(0).ownText().equals("Rate: ")) {
            return e.getElementsByTagName("div").get(1).ownText();
        }
    }

    return null;
}

Assuming doc is your Document object... 假设doc是你的Document对象......

doc.select('.column > div:eq(1)');

should do the job, you basically select the parent div by class, then get all child div's, but filter the child div's so that the element at index 1 is returned (this is a zero based index, so index 1 is the 2nd element) 应该做的工作,您基本上按类选择父div,然后获取所有子div,但过滤子div,以便返回索引1处的元素(这是一个基于零的索引,因此索引1是第二个元素)

Personally, i'd switch to jQuery as it uses a far better selector engine, but each to their own... 就个人而言,我会切换到jQuery,因为它使用了更好的选择器引擎,但每个都是他们自己的......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM