简体   繁体   English

如何将Jsoup输出存储在ArrayList中?

[英]How can I store Jsoup output in an ArrayList?

I parsed a website with Jsoup and extracted the links. 我用Jsoup解析了一个网站并提取了链接。 Now I tried to store just a part of that link in an ArrayList. 现在,我尝试将链接的一部分仅存储在ArrayList中。 Somehow I cannot store one link at a time. 不知何故我一次不能存储一个链接。

I tried several String methods, Scanner and BufferedReader without success. 我尝试了几种String方法,Scanner和BufferedReader均未成功。

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class DatenImportUnternehmen {


public static void main(String[] args) throws IOException {

    ArrayList<String> aktien = new ArrayList<String>();
    String searchUrl = "https://www.ariva.de/aktiensuche/_result_table.m";


    for(int i = 0; i < 1; i++) {

        String searchBody = "page=" + Integer.toString(i) + 
    "&page_size=25&sort=ariva_name&sort_d=asc 
    &ariva_performance_1_year=_&ariva_per 
    formance_3_years=&ariva_performance_5_years= 
    &index=0&founding_year=&land=0&ind 
    ustrial_sector=0&sector=0&currency=0 
    &type_of_share=0&year=_all_years&sales=_&p 
    rofit_loss=&sum_assets=&sum_liabilities= 
    &number_of_shares=&earnings_per_share= 
    &dividend_per_share=&turnover_per_share= 
    &book_value_per_share=&cashflow_per_sh 
    are=&balance_sheet_total_per_share= 
    &number_of_employees=&turnover_per_employee 
    =_&profit_per_employee=&kgv=_&kuv=_&kbv=_&dividend 
    _yield=_&return_on_sales=_";


    // post request to search URL
    Document document = 
    Jsoup.connect(searchUrl).requestBody(searchBody).post();
    // find links in returned HTML
    for(Element link:document.select("a[href]")) {
        String link1 = link.toString();
        String link2 = link1.substring(link1.indexOf('/'));
        String link3 = link2.substring(0, link2.indexOf('"'));


        aktien.add(link3);

        System.out.println(aktien);

    }
    }


}
}                             

My output looks like (just a part of it): 我的输出看起来像(只是一部分):

[/1-1_drillisch-aktie]
[/1-1_drillisch-aktie, /11_88_0_solutions-aktie]
[/1-1_drillisch-aktie, /11_88_0_solutions-aktie, /1st_red-aktie]
[/1-1_drillisch-aktie, /11_88_0_solutions-aktie, /1st_red-aktie, /21st- 
_cent-_fox_b_new-aktie]
[/1-1_drillisch-aktie, /11_88_0_solutions-aktie, /1st_red-aktie, /21st- 
_cent-_fox_b_new-aktie, /21st_century_fox-aktie]
[/1-1_drillisch-aktie, /11_88_0_solutions-aktie, /1st_red-aktie, /21st- 
_cent-_fox_b_new-aktie, /21st_century_fox-aktie, /2g_energy-aktie]
[/1-1_drillisch-aktie, /11_88_0_solutions-aktie, /1st_red-aktie, /21st- 
_cent-_fox_b_new-aktie, /21st_century_fox-aktie, /2g_energy-aktie, 
/3i_group-aktie]
[/1-1_drillisch-aktie, /11_88_0_solutions-aktie, /1st_red-aktie, /21st- 
_cent-_fox_b_new-aktie, /21st_century_fox-aktie, /2g_energy-aktie, 
/3i_group-aktie, /3i_infrastructure-aktie] 

What I want to achieve is: 我想要实现的是:

[/1-1_drillisch-aktie]
[/11_88_0_solutions-aktie]
[/1st_red-aktie]
[/21st-_cent-_fox_b_new-aktie]

and so on. 等等。

I just don't now what the problem is at this stage. 我现在不知道这个阶段存在什么问题。

Your problem is that you are printing the array whilst adding to it in the loop. 您的问题是您在打印数组的同时将其添加到循环中。

To resolve the issue you can print the array outside of the array to print everything in one go, or you can print link3 (which is what you are adding to the ArrayList), instead of the array in the loop. 要解决此问题,您可以在数组外打印数组以一次性打印所有内容,也可以打印link3 (这是您要添加到ArrayList的内容),而不是循环中的数组。

Option 1: 选项1:

for(Element link:document.select("a[href]")) {
    String link1 = link.toString();
    String link2 = link1.substring(link1.indexOf('/'));
    String link3 = link2.substring(0, link2.indexOf('"'));

    aktien.add(link3);
}
System.out.println(aktien);

Option 2: 选项2:

for(Element link:document.select("a[href]")) {
    String link1 = link.toString();
    String link2 = link1.substring(link1.indexOf('/'));
    String link3 = link2.substring(0, link2.indexOf('"'));

    aktien.add(link3);
    System.out.println(link3);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM