简体   繁体   English

JSOUP:在div之后获取文本,其中包含特定文本

[英]JSOUP: Get text after div with specific text inside

In short, I am creating an Ancient Greek concordance program for OSX, so I need to collect definitions from a lexicon. 简而言之,我正在为OSX创建古希腊协议程序,因此我需要从词典中收集定义。

In the http://biblehub.com/greek/1.htm page, I need to retrieve the text under "Strong's Exhaustive Concordance". http://biblehub.com/greek/1.htm页面上,我需要检索“ Strong's Exhaustive Concordance”下的文本。 The issue is that that div in the HTML file contains the same class as other divs, which makes programmatically finding that specific div difficult. 问题在于HTML文件中的div与其他div包含相同的类,这使得以编程方式查找该特定div变得困难。

In JSOUP, I searched for text after the divs that contain "Strong's Exhaustive Concordance," yet the output is "Strong's Exhaustive Concordance" instead of the definition of the word. 在JSOUP中,我在div之后搜索包含“ Strong's Exhaustive Concordance”的文本,但输出为“ Strong's Exhaustive Concordance”,而不是单词的定义。

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Document;

public class Greek {

    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("http://biblehub.com/greek/1.htm").get();

        Elements n = doc.select("div.vheading2:containsOwn(Strong's Exhaustive Concordance) + p");

        System.out.println(n.text());
    }
}

Did you know that there is a very handy tool that will help you locate the element in Chrome dev tool? 您是否知道有一个非常方便的工具可以帮助您在Chrome开发工具中定位元素?

Right click on the element you want to locate, then right-click -> Inspect, which will present you with the HTML code for the element. 右键单击要定位的元素,然后右键单击->检查,这将向您显示该元素的HTML代码。 Right-click on the element and select Copy -> You will see a range of options such as CSS Selector, XPath available for you :) See below screenshot: 右键单击该元素,然后选择复制->您将看到一系列选项,例如CSS选择器,可供您使用的XPath :)参见以下屏幕截图:

So in your case, it would be: Jsoup.select("#leftbox > div > p:nth-child(74)"); 因此,在您的情况下,它将是: Jsoup.select("#leftbox > div > p:nth-child(74)");

在此处输入图片说明

I've ferreted the solution. 我已经提出了解决方案。

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class Greek {
    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("http://biblehub.com/greek/1.htm").get();


        // contains an array of all elements with out desired ID
        Elements n = doc.select("div.vheading2");

        // cycle through the array until we find the member that contains the text above the word's definition
        for (Element e : n) {
            if (e.text().equalsIgnoreCase("Strong's Exhaustive Concordance")) {

                // finally, we print the next element, which is our definition
                System.out.println(e.nextElementSibling().text());
            }
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM