简体   繁体   English

使用jsoup获取特定的孩子

[英]get specific child using jsoup

i have this simple example to illustrate my problem: this is a html page test.html : 我有一个简单的例子来说明我的问题:这是一个html页面test.html:

<body>

        <div class="partA">
            part a
        </div>
        <script></script>
        <div class="partB">
            part b 
        </div>
        <div class="partC">
            part c
        </div>
        <div class="parthh">
            <div>
                part b 2 
            </div>

            <div class="partD">
                part d
            </div>

        </div>
    </body>

and this is my code java : 这是我的代码java:

 public static void main(String[] args) throws IOException {


    Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\test.html"), "UTF-8");

    Elements el = doc.select("body > div:eq(1)");

    System.out.println(el.toString());

}

the problem that tag 'script' or other tags came before my diserd tag(the second div in this case ) prevent the good execution of code , and the returned result is empty . 标签'script'或其他标签在我的diserd标签(在本例中为第二个div)之前出现的问题阻止了代码的良好执行,并且返回的结果为空。

please how can i ignore those undisered tags and get the specific one . 请我如何忽略这些未标记的标签并获得具体的标签。

You can remove all the script tags from your HTML : 您可以从HTML删除所有script标签:

Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\test.html"), "UTF-8");
Elements el = doc.select("script");
for (Element e : el) {
    e.remove();
}
el = doc.select("body > div:eq(1)");
System.out.println(el.toString());

Now your doc won't contain that tag, and you'll get the desired output. 现在,您的文档将不再包含该标记,您将获得所需的输出。

选择器body > div:nth-of-type(2)不是您想要的吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM