繁体   English   中英

Java使用Jsoup从网站读取信息

[英]Java reading information from a website using Jsoup

我已经阅读了很多有关解析之类的文章。 我看到的大多数回复都建议该人使用图书馆或其他东西。 我现在的问题是创建一种算法,该算法将获取我想要的确切信息。 我的目的是从“天气”网站获取2个关闭学校的状态。 我开始按照有人推荐的方式使用Jsoup,但我需要帮助。

网页: 点击这里

图片: 点击这里

网页来源示例: 点击此处

因为我已经知道我要寻找的学校的名称,所以我可能可以弄清楚如何在网页中获得特定的文本行,但是我所需要的状态要低2行。 如果每所学校都具有一定的状态,但是它们都关闭或延迟两个小时,那将很容易,所以我不能仅仅为此进行搜索。 我想要一些有关如何解决这个问题的想法或答案。 我要这样做2次,因为我想查询2所学校。 我已经有了可以用来查找它们的名称,我只需要状态即可。

这是我想做的一个例子。 (sudo代码)

Document doc = connect(to url);
Element schoolName1 = doc.lookForText(htmlLineHere/schoolname);

String status1 = schoolName.getNext().text();//suppose this gets the line right after which should be my status and then cleans off the Html.

这就是我现在所拥有的

public static SchoolClosing lookupDebug() throws IOException {
        final ArrayList<String> Status = new ArrayList<String>();

        try {
            //connects to my wanted website
            Document doc = Jsoup.connect("http://www.10tv.com/content/sections/weather/closings.html").get();
            //selects/fetches the line of code I want
            Element schoolName = doc.html("<td valign="+"top"+">Athens City Schools</td>");
            //an array of Strings where I am going to add the text I need when I get it
            final ArrayList<String> temp = new ArrayList<String>();
            //checking if its fetching the text
            System.out.println(schoolName.text());
            //add the text to the array
            temp.add(schoolName.text());
            for (int i = 0; i <= 1; i++) {
                final String[] tempStatus = temp.get(i).split(" ");
                Status.add(tempStatus[0]);
            }
        } catch (final IOException e) {
            throw new IOException("There was a problem loading School Closing Status");
        }
        return new SchoolClosing(Status);
    }
Document doc = Jsoup.connect(
        "http://www.10tv.com/content/sections/weather/closings.html")
        .get();
for (Element tr : doc.select("#closings tr")) {
    Element tds = tr.select("td").first();
    if (tds != null) {
        String county = tr.select("td:eq(0)").text();
        String schoolName = tr.select("td:eq(1)").text();
        String status = tr.select("td:eq(2)").text();
        System.out.println(String.format(
                "county: %s, schoolName: %s, status: %s", county,
                schoolName, status));
    }
}

输出:

county: Athens, schoolName: Beacon School, status: Two-hour Delay
county: Franklin, schoolName: City of Grandview Heights, status: Snow Emergency through 8pm Thursday
county: Franklin, schoolName: Electrical Trades Center, status: All Evening Activities Cancelled
county: Franklin, schoolName: Hilock Fellowship Church, status: PM Services Cancelled
county: Franklin, schoolName: International Christian Center, status: All Evening Activities Cancelled
county: Franklin, schoolName: Maranatha Baptist Church, status: PM Services Cancelled
county: Franklin, schoolName: Masters Commission New Covenant Church, status: Bible Study Cancelled
county: Franklin, schoolName: New Life Christian Fellowship, status: All Activities Cancelled
county: Franklin, schoolName: The Epilepsy Foundation of Central Ohio, status: All Evening Activities Cancelled
county: Franklin, schoolName: Washington Ave United Methodist Church, status: All Evening Activities Cancelled

或循环:

for (Element tr : doc.select("#closings tr")) {
    System.out.println("----------------------");
    for (Element td : tr.select("td")) {
        System.out.println(td.text());
    }
}

给出:

----------------------
Athens
Beacon School
Two-hour Delay
----------------------
Franklin
City of Grandview Heights
Snow Emergency through 8pm Thursday
----------------------
Franklin
Electrical Trades Center
All Evening Activities Cancelled
...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM