简体   繁体   English

如何从另一个表中具有相同类的表中抓取数据

[英]how to scrape data from one table having same class in other table

I have to scrape data and save it in .csv file from a web site which have many tables. 我必须从具有许多表的网站上抓取数据并将其保存在.csv文件中。 I only scrape the data of one table having class marketData. 我只抓取一个具有class marketData的表的数据。 But, there are two other tables having the same class. 但是,还有两个其他表具有相同的类。 Currently my code is bringing all data from tables having class marketData. 目前,我的代码正在从具有class marketData的表中获取所有数据。 How can I scrape data from one table and skip other tables? 如何从一个表中抓取数据并跳过其他表? my code is as follows. 我的代码如下。

public class ComMarket_summary {

boolean writeCSVToConsole = true;
boolean writeCSVToFile = true;
boolean sortTheList = true;
boolean writeToConsole;
boolean writeToFile;
public static Document doc = null;
public static Elements tbodyElements = null;
public static Elements elements = null;
public static Elements tdElements = null;
public static Elements trElement2 = null;
public static String Dcomma = ",";
public static String line = "";
public static ArrayList<Elements> sampleList = new ArrayList<Elements>();

public static void createConnection() throws IOException {
    System.setProperty("http.proxyHost", "191.1.1.202");
    System.setProperty("http.proxyPort", "8080");
    String tempUrl = "http://www.psx.com.pk/phps/mktSummary.php";
    doc = Jsoup.parse(new URL(tempUrl), 1000);        
    System.out.println("Successfully Connected");
}

public static void parsingHTML() throws Exception {

    for (Element table : doc.getElementsByTag("table")) {
        for (Element trElement : table.getElementsByTag("tr")) {
            File fold = new File("C:\\market_smry.csv");
            fold.delete();
            File fnew = new File("C:\\market_smry.csv");
            trElement2 = trElement.getElementsByTag("tr");
            tdElements = trElement.getElementsByTag("td");
            FileWriter sb = new FileWriter(fnew, true);

            if (table.hasClass("marketData")) {

                for (Iterator<Element> it = trElement2.iterator(); it.hasNext();) {
                    if (it.hasNext()) {
                        sb.append("\r\n");

                    }

                    for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
                        Element tdElement2 = it.next();
                        final String content = tdElement2.text();
                        if (it2.hasNext()) {

                            sb.append(formatData(content));
                            sb.append("   ,   ");

                        }

                    }

                    System.out.println(sb.toString());
                    sb.flush();
                    sb.close();
                }
            }
            System.out.println(sampleList.add(tdElements));

        }
    }
}
private static final SimpleDateFormat FORMATTER_MMM_d_yyyy = new SimpleDateFormat("MMM d, yyyy", Locale.US);
private static final SimpleDateFormat FORMATTER_dd_MMM_yyyy = new SimpleDateFormat("dd-MMM-YYYY", Locale.US);

public static String formatData(String text) {
    String tmp = null;

    try {
        Date d = FORMATTER_MMM_d_yyyy.parse(text);
        tmp = FORMATTER_dd_MMM_yyyy.format(d);
    } catch (ParseException pe) {
        tmp = text;
    }

    return tmp;
}

public static void main(String[] args) throws IOException, Exception {
    createConnection();
    parsingHTML();

}

PS: I am using JDK 1.8,Jre 1.8, jsoup 1.8. PS:我正在使用JDK 1.8,Jre 1.8,jsoup 1.8。

You can optimize your code by using a more specific selector. 您可以使用更具体的选择器来优化代码。

for (Element table : doc.select("table.marketData")) {
//Process table
}

If you want to process just a specific table on the page, you can access the table by its index. 如果只想处理页面上的特定表,则可以按其索引访问该表。

Elements tables = doc.select("table.marketData");
Element table = tables.get(1);

Seeing as how there are 3 tables with class "marketData", you will need to find some other identifying feature of the table you want (Does the table you want have an id?, Are the header columns different?, ect). 看到类别为“ marketData”的表有3个,您将需要找到所需表的其他标识功能(所需表是否具有ID?标题页是否不同?等等)。 Without seeing the html, I can't give more guidance than that, though. 但是,如果没有看到html,我将无法提供更多指导。

Let's suppose you want to extract data from the first table. 假设您要从第一个表中提取数据。
You would use this CSS selector: table.marketData:nth-of-type(1) . 您将使用以下CSS选择器: table.marketData:nth-of-type(1)

You code then become: 您的代码将变为:

for (Element table : doc.getElementsByTag("table.marketData:nth-of-type(1)")) {
    for (Element trElement : table.getElementsByTag("tr")) {
        File fold = new File("C:\\market_smry.csv");
        fold.delete();
        File fnew = new File("C:\\market_smry.csv");
        trElement2 = trElement.getElementsByTag("tr");
        tdElements = trElement.getElementsByTag("td");
        FileWriter sb = new FileWriter(fnew, true);

        // /////////
        // You can safely remove the if block below.  
        // Jsoup has already performed the filtering for you.
        // /////////
        //if (table.hasClass("marketData")) {

            for (Iterator<Element> it = trElement2.iterator(); it.hasNext();) {
                if (it.hasNext()) {
                    sb.append("\r\n");

                }

                for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
                    Element tdElement2 = it.next();
                    final String content = tdElement2.text();
                    if (it2.hasNext()) {

                        sb.append(formatData(content));
                        sb.append("   ,   ");

                    }

                }

                System.out.println(sb.toString());
                sb.flush();
                sb.close();
            }
        //}
        System.out.println(sampleList.add(tdElements));
    }
}

References: 参考文献:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从具有动态值的表中抓取数据? - How to scrape data from a table with dynamic values? 如何将数据插入表中,使得一个值来自另一个表,另一个值来自表单 - How to insert data into a table such that one value comes from another table and other value comes from the form 尝试使用jSoup从表中抓取数据 - Trying to use jSoup to scrape data from a table 如何选择一个表的数据并插入其他表? - How to select data of one table and insert in other table? 如何显示一个表中下拉框的第一个值和另一个表中相同下拉列表的其他值 - How can I display first value of drop down box from one table and other values of same drop down from another table 如何从表格中抓取网页? - how to web-scrape from a table? 如何从休眠中的同一张表的结果中获取一张表中的数据 - How to fetch data from a table from the result of same table in hibernate 使用Java在Cassandra中将数据从一个表复制到另一个表 - Copy data from one table to other in Cassandra using Java 使用 Hibernate 将数据从一个表复制到另一个表 - Copy data from one table to other using Hibernate 如何从mysql中的两个表中获取值,并从一个表中获取不同值,并从另一个表中获取多个值? - how to get value from two table in mysql with from one table distinct and from other table multiple value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM