![](/img/trans.png)
[英]Using JSoup with Android Studio to collect information from website or rss feed
[英]Java reading information from a website using Jsoup
我已经阅读了很多有关解析之类的文章。 我看到的大多数回复都建议该人使用图书馆或其他东西。 我现在的问题是创建一种算法,该算法将获取我想要的确切信息。 我的目的是从“天气”网站获取2个关闭学校的状态。 我开始按照有人推荐的方式使用Jsoup,但我需要帮助。
网页: 点击这里
图片: 点击这里
网页来源示例: 点击此处
因为我已经知道我要寻找的学校的名称,所以我可能可以弄清楚如何在网页中获得特定的文本行,但是我所需要的状态要低2行。 如果每所学校都具有一定的状态,但是它们都关闭或延迟两个小时,那将很容易,所以我不能仅仅为此进行搜索。 我想要一些有关如何解决这个问题的想法或答案。 我要这样做2次,因为我想查询2所学校。 我已经有了可以用来查找它们的名称,我只需要状态即可。
这是我想做的一个例子。 (sudo代码)
Document doc = connect(to url);
Element schoolName1 = doc.lookForText(htmlLineHere/schoolname);
String status1 = schoolName.getNext().text();//suppose this gets the line right after which should be my status and then cleans off the Html.
这就是我现在所拥有的
public static SchoolClosing lookupDebug() throws IOException {
final ArrayList<String> Status = new ArrayList<String>();
try {
//connects to my wanted website
Document doc = Jsoup.connect("http://www.10tv.com/content/sections/weather/closings.html").get();
//selects/fetches the line of code I want
Element schoolName = doc.html("<td valign="+"top"+">Athens City Schools</td>");
//an array of Strings where I am going to add the text I need when I get it
final ArrayList<String> temp = new ArrayList<String>();
//checking if its fetching the text
System.out.println(schoolName.text());
//add the text to the array
temp.add(schoolName.text());
for (int i = 0; i <= 1; i++) {
final String[] tempStatus = temp.get(i).split(" ");
Status.add(tempStatus[0]);
}
} catch (final IOException e) {
throw new IOException("There was a problem loading School Closing Status");
}
return new SchoolClosing(Status);
}
Document doc = Jsoup.connect(
"http://www.10tv.com/content/sections/weather/closings.html")
.get();
for (Element tr : doc.select("#closings tr")) {
Element tds = tr.select("td").first();
if (tds != null) {
String county = tr.select("td:eq(0)").text();
String schoolName = tr.select("td:eq(1)").text();
String status = tr.select("td:eq(2)").text();
System.out.println(String.format(
"county: %s, schoolName: %s, status: %s", county,
schoolName, status));
}
}
输出:
county: Athens, schoolName: Beacon School, status: Two-hour Delay
county: Franklin, schoolName: City of Grandview Heights, status: Snow Emergency through 8pm Thursday
county: Franklin, schoolName: Electrical Trades Center, status: All Evening Activities Cancelled
county: Franklin, schoolName: Hilock Fellowship Church, status: PM Services Cancelled
county: Franklin, schoolName: International Christian Center, status: All Evening Activities Cancelled
county: Franklin, schoolName: Maranatha Baptist Church, status: PM Services Cancelled
county: Franklin, schoolName: Masters Commission New Covenant Church, status: Bible Study Cancelled
county: Franklin, schoolName: New Life Christian Fellowship, status: All Activities Cancelled
county: Franklin, schoolName: The Epilepsy Foundation of Central Ohio, status: All Evening Activities Cancelled
county: Franklin, schoolName: Washington Ave United Methodist Church, status: All Evening Activities Cancelled
或循环:
for (Element tr : doc.select("#closings tr")) {
System.out.println("----------------------");
for (Element td : tr.select("td")) {
System.out.println(td.text());
}
}
给出:
----------------------
Athens
Beacon School
Two-hour Delay
----------------------
Franklin
City of Grandview Heights
Snow Emergency through 8pm Thursday
----------------------
Franklin
Electrical Trades Center
All Evening Activities Cancelled
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.