简体   繁体   English

Jsoup Java For循环和元素

[英]Jsoup Java For Loops and Elements

I'm learning jsoup for use in java. 我正在学习在Java中使用的jsoup。 First of all, I'm not really understanding what the difference is between jsoup "Elements" and jsoup "Element" and when to use each. 首先,我不太了解jsoup“元素”和jsoup“元素”之间的区别以及何时使用它们。 Here's an example of what I'm trying to do. 这是我要执行的操作的示例。 Using this url http://en.wikipedia.org/wiki/List_of_bow_tie_wearers#Architects I want to parse the text names under the category "Architects". 使用此网址http://en.wikipedia.org/wiki/List_of_bow_tie_wearers#Architects我想解析“ Architects”类别下的文本名称。 I've tried this: 我已经试过了:

Document doc = null;
    try {
        doc = Jsoup.connect("http://en.wikipedia.org/wiki/List_of_bow_tie_wearers").get();
    } catch (IOException e) {

    }
    Elements basics = doc.getElementsByClass("mw-redirect");

    String text = basics.text();

    System.out.println(text);

}

Here is the output: 这是输出:

run:
Franklin Roosevelt Arthur Schlesinger, Jr. Reagan administration University of Colorado at Boulder Eric R. Kandel Eugene H. Spafford Arthur Schlesinger, Jr. middle finger John Daly Sir Robin Day Today show Tom Oliphant Today show Harry Smith TV chef Panic! At The Disco Watergate Watergate Hillary Clinton Donald M. Payne, Jr. Franklin Roosevelt Baldwin–Wallace College Howard Phillips Twilight Sparkle Gil Chesterton Bertram Cooper Richard Gilmore Dr. Donald "Ducky" Mallard, M.D., M.E. Medical Examiner Brother Mouzone hitman Buckaroo Banzai Conan Edogawa Jack Point Waylon Smithers Franklin Roosevelt NFL Chronicle of Higher Education Evening Standard

I'm really just trying to learn the basics of traversing a HTML document but I'm having trouble with the jsoup cookbook as it is confusing for a beginner. 我真的只是想学习遍历HTML文档的基础知识,但是我对jsoup菜谱有麻烦,因为它会使初学者感到困惑。 Any help is appreciated. 任何帮助表示赞赏。

Regarding your first question, the difference between Elements and Element is, as the names indicate, the number of items. 关于第一个问题,顾名思义,Elements和Element之间的区别是项数。

An object of type Element contains one HTML node. 类型为Element的对象包含一个HTML节点。 One of type Elements multiple. 元素倍数类型之一。

If you take a look at the constructors in the api documentation for Element and Elements , it becomes rather obvious. 如果您查看api文档中有关ElementElements的构造函数,则它会变得很明显。

Now for the parsing part. 现在到解析部分。 In your code you are looking for "mw-redirect", wich is not enough. 在您的代码中,您正在寻找“ mw-redirect”,这还不够。 You need to first "navigate" to the correct section. 您需要先“导航”到正确的部分。

I've made a working sample here: 我在这里做了一个工作样本:

Document doc = null;
try {
    doc = Jsoup.connect("http://en.wikipedia.org/wiki/List_of_bow_tie_wearers").get();
} catch (IOException e) {

}

if (doc != null) {

    // The Architect headline has an id. Awesome! Let's select it.
    Element architectsHeadline = doc.select("#Architects").first();

    // For educational purposes, let's see what we've got there...
    System.out.println(architectsHeadline.html());

    // Now, we use some other selector then .first(), since we need to
    // get the tag after the h3 with id Architects.
    // We jump back to the h3 using .parent() and select the succeding tag
    Element architectsList = architectsHeadline.parent().nextElementSibling();

    // Again, let's have a peek
    System.out.println(architectsList.html());

    // Ok, now since we have our list, let's traverse it.
    // For this we select every a tag inside each li tag
    // via a css selector
    for(Element e : architectsList.select("li > a")){
      System.out.println(e.text());
    }
}

I hope this helps. 我希望这有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM