简体   繁体   English

如何使用Jsoup存储数据

[英]How can I store data, using Jsoup

My problem is the following: I got some data with Jsoup from a website (Jsoup code is from here also) 我的问题如下:我从网站上通过Jsoup获得了一些数据(Jsoup代码也是从这里获得的)

    public class Kereso {

   public static void main(String[] args) throws IOException {
      String html = "http://www.szerencsejatek.hu/xls/otos.html";

      Document doc = Jsoup.connect(html).get();


       Elements tableElements = doc.select("table");

       Elements tableRowElements = tableElements.select(":not(thead) tr");
       for (Element row : tableRowElements) {

           Elements rowItems = row.select("td");
           for (Element rowItem : rowItems) {
               System.out.println(rowItem.text());
           }
           System.out.println();
       }
   }
}

Every line that I get from the website should be an object and I want to store all of these objects in an ArrayList 我从网站获得的每一行都应该是一个对象,我想将所有这些对象存储在ArrayList中

This is the class for the objects, and the data they need: 这是对象的类,以及它们需要的数据:

public class Huzas {

    private String ev;
    private String het;
    private String huzasdatum;
    private String otosDb;
    private String otos;
    private String negyesDb;
    private String negyes;
    private String harmasDb;
    private String harmas;
    private String kettesDb;
    private String kettes;
    private int szam1;
    private int szam2;
    private int szam3;
    private int szam4;
    private int szam5;

    public Huzas(String ev, String het, String huzasdatum, String otosDb, String otos, String negyesDb, String negyes, String harmasDb, String harmas, String kettesDb, String kettes, int szam1, int szam2, int szam3, int szam4, int szam5) {
        this.ev = ev;
        this.het = het;
        this.huzasdatum = huzasdatum;
        this.otosDb = otosDb;
        this.otos = otos;
        this.negyesDb = negyesDb;
        this.negyes = negyes;
        this.harmasDb = harmasDb;
        this.harmas = harmas;
        this.kettesDb = kettesDb;
        this.kettes = kettes;
        this.szam1 = szam1;
        this.szam2 = szam2;
        this.szam3 = szam3;
        this.szam4 = szam4;
        this.szam5 = szam5;
    }

Is it possible to store them in that way? 有可能以这种方式存储它们吗? And if Yes, of course how? 如果是的话,当然如何?

Every line that I get from the website should be an object and I want to store all of these objects in an ArrayList. 我从网站获得的每一行都应该是一个对象,我想将所有这些对象存储在ArrayList中。

To link the columns data with your Huzas object, you will have to write a wrapper. 要将列数据与Huzas对象链接,您将必须编写包装器。

If I am not mistaken, you are iterating through a table on some site, so you should already have an idea as to the ordering of the columns. 如果我没记错的话,您正在遍历某个站点上的表,因此您应该已经对列的顺序有了一个了解。 Simply catch the values of each coloumn on iteration and set the value of your Huzas object as it is encountered. 只需在迭代时捕获每个列的值,并设置遇到的Huzas对象的值即可。

But first we need some list where we can map our columns with the values obtained on each DOM iterations. 但是首先我们需要一些列表,我们可以在其中使用每个DOM迭代获得的值映射列。

// The elements should be in order of the columns in the DOM
String[] columnsFromDOM = {"ev", "huzasdatum", "het" ..... } 

Now 现在

List<Huzas> listOfHuzas = new ArrayList<>();
for (Element row : tableRowElements) {

       Map<String, String> columnToObjectMap = new HashMap<>();
       Elements rowItems = row.select("td");
       int index = 0;
       for (Element rowItem : rowItems) {
           // (key,value) => ("ev", value_from_dom )
           columnToObjectMap.put(columnsFromDOM[index], rowItem.text());
           indexx++;
       }

       // now columnToObjectMap contains your values along with the relavant keys
       // So now catch each value and assign it to a Huzas object

       String ev = columnToObjectMap.get("ev");
       String kettes = columnToObjectMap.get("kettes");
       ......

       Huzas huzas = new Huzas(ev, het, .....);
       listofHuzas.add(huzas);
   }

Since that site has a simple and structured html you could simply do like this 由于该站点具有简单且结构化的html,因此您可以像这样简单地进行操作

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Kereso {

    public static void main(String[] args) throws IOException {
        String html = "http://www.szerencsejatek.hu/xls/otos.html";
        List<Huzas> listOfHuzas = new ArrayList<Huzas>();

        Document doc = Jsoup.connect(html).get();
        Elements rows = doc.select("tr");
        rows.remove(0); //Remove head row
        for (Element row : rows) {
            Elements children = row.children();
            listOfHuzas.add(new Huzas(children.get(0).text(), // ev
                            children.get(1).text(), // het
                            children.get(2).text(), // huzasdatum
                            children.get(3).text(), // otosDb
                            children.get(4).text(), // otos
                            children.get(5).text(), // negyesDb
                            children.get(6).text(), // negyes
                            children.get(7).text(), // harmasDb
                            children.get(8).text(), // harmas
                            children.get(9).text(), // kettesDb
                            children.get(10).text(), // kettes
                            Integer.parseInt(children.get(11).text()), // szam1
                            Integer.parseInt(children.get(12).text()), // szam2
                            Integer.parseInt(children.get(13).text()), // szam3
                            Integer.parseInt(children.get(14).text()), // szam4
                            Integer.parseInt(children.get(15).text())) // szam5
                        );
        }
        System.out.println(listOfHuzas);
    }
}

Since every row had exactly 16 columns and all int field had values i just directly indexed the child elements for simplicity. 由于每一行正好有16列,并且所有int字段都具有值,因此为了简单起见,我直接对子元素进行了索引。 You may add more length checks or error handling here. 您可以在此处添加更多的长度检查或错误处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM