简体   繁体   中英

web scraping jsoup java unable to scrape full information

I have an information to be scraped from a website. I could scrape it. But not all the information is being scraped. There is so much of data loss. The following images helps you further to understand : 这是我要抓取的数据:

I used Jsoup, connected it to URL and then extracted this particular data using the following code :

Document doc = Jsoup.connect("https://www.awattar.com/tariffs/hourly#").userAgent("Mozilla/17.0").get();
Elements durationCycle = doc.select("g.x.axis g.tick text");

But in the result, I couldn't find any of that related information at all. So I printed the whole document from the URL and it shows the following : 缺少数据和完整信息

I could see the information when I download the page and read it as an input file but not when I connect directly to URL. But I want to connect it to URL. Is there any suggestion?

I hope my question is understandable. Let me know in case if it is not explanatory.

There is a request body limitation in Jsoup. you should use the maxBodySize parameter:

Document doc = Jsoup.connect("https://www.awattar.com/tariffs/hourly#").userAgent("Mozilla/17.0").maxBodySize(0).get();

"0" is no limit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM