簡體   English   中英


[英]How to parse multiple html elements in JSOUP?

我正在嘗試從java項目中保存的HTML文檔中的警察局(加爾達是愛爾蘭愛爾蘭警察)解析犯罪統計的簡單html表。 目前,我正在嘗試從html文檔中解析內容並將其打印到控制台。 我遇到的問題是,我只能在表格中打印數字(不包括年份),但是我要達到的目的是從表格中獲得犯罪的名稱,后跟6個數字。



<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Recorded Crime Offences (Number) by Garda Station, Type of Offence and&lt;BR&gt;
<table border="">
<tbody><tr align="LEFT">
<th colspan="8">Recorded Crime Offences (Number) by Garda Station, Type of Offence and<br>
<tr align="LEFT">
<th colspan="2"> </th>
<th valign="TOP" colspan="1">2011</th>
<th valign="TOP" colspan="1">2012</th>
<th valign="TOP" colspan="1">2013</th>
<th valign="TOP" colspan="1">2014</th>
<th valign="TOP" colspan="1">2015</th>
<th valign="TOP" colspan="1">2016</th>
<tr align="RIGHT">
<th align="LEFT" valign="TOP" rowspan="12">Balbriggan, D.M.R. Northern Division</th>
<th align="LEFT">03 ,Attempts/threats to murder, assaults, harassments and related offences</th>
<tr align="RIGHT">
<th align="LEFT">04 ,Dangerous or negligent acts</th>
<tr align="RIGHT">
<th align="LEFT">05 ,Kidnapping and related offences</th>
<tr align="RIGHT">
<th align="LEFT">06 ,Robbery, extortion and hijacking offences</th>
<tr align="RIGHT">
<th align="LEFT">07 ,Burglary and related offences</th>
<tr align="RIGHT">
<th align="LEFT">08 ,Theft and related offences</th>
<tr align="RIGHT">
<th align="LEFT">09 ,Fraud, deception and related offences</th>
<tr align="RIGHT">
<th align="LEFT">10 ,Controlled drug offences</th>
<tr align="RIGHT">
<th align="LEFT">11 ,Weapons and Explosives Offences</th>
<tr align="RIGHT">
<th align="LEFT">12 ,Damage to property and to the environment</th>
<tr align="RIGHT">
<th align="LEFT">13 ,Public order and other social code offences</th>
<tr align="RIGHT">
<th align="LEFT">15 ,Offences against government, justice procedures and organisation of crime</th>
<tr align="LEFT">
<td colspan="8"><a href="http://www.cso.ie/en/methods/crime/recordedcrime/">See Background Notes</a> 



Figure 0 : 96
Figure 1 : 89
Figure 2 : 70
Figure 3 : 97
Figure 4 : 103
Figure 5 : 103
Figure 6 : 72
Figure 7 : 67
Figure 8 : 50
Figure 9 : 53
Figure 10 : 45
... (Figures 11-66 omitted for conciseness)
Figure 67 : 48
Figure 68 : 39
Figure 69 : 39
Figure 70 : 66
Figure 71 : 50


Crime: 03 ,Attempts/threats to murder, assaults, harassments and related offences
Figure 0 : 96
Figure 1 : 89
Figure 2 : 70
Figure 3 : 97
Figure 4 : 103
Figure 5 : 103

Crime: 04 ,Dangerous or negligent acts
Figure 6 : 72
Figure 7 : 67
Figure 8 : 50
Figure 9 : 53
Figure 10 : 45
etc, etc


Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0  


import java.io.*;   
import org.jsoup.*; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element; 
import org.jsoup.select.Elements;

public class ParseCrimeStatistics {

    public static void main(String[]args) {
    try {

        int count = 0;
            File input = new File("Balbriggan.html");
            Document doc =Jsoup.parse(input, "UTF-8", "http://www.cso.ie");

            Elements title = doc.select("td");

                for(Element sectd1:title){
                    Elements ths = sectd1.select("td"); 

                    String result = ths.get(0).text();

                    System.out.println("Figure " + count  + " : "+ result);


    }catch (IOException e) {

有人會對我如何解決這個問題有任何建議嗎? 謝謝。


int count = 0;
File input = new File("Balbriggan.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://www.cso.ie");

Elements numbers = doc.select("td");
Elements titles = doc.select("th");

for(int i=9; i<titles.size(); i++)
    System.out.println("Crime: " + titles.get(i).text());  
    for(int j=0; j<6; j++)
        System.out.println("Figure " + count + ":" + numbers.get((i-9)*6+j).text());


聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

粵ICP備18138465號  © 2020-2024 STACKOOM.COM