简体   繁体   中英

Jsoup not getting full html

I am trying to Jsoup to parse the html from the URL http://www.threadflip.com/shop/search/john%20hardy

Jsoup looks to only get the data from the line

<![CDATA[ window.gon= ..............

Does anyone know why this would be?

Document doc = Jsoup.connect("http://www.threadflip.com/shop/search/john%20hardy").get();

The site you try to parse loads most of its contents async via AJAX calls. JSoup does not interpret Javascript and therefore does not act like a browser. It seems that the store is filled by calling their api:

http://www.threadflip.com/api/v3/items?attribution%5Bapp%5D=web&item_collection_id=&q=john+hardy&page=1&page_size=30

So maybe you need to directly load the API Url in order to read the stuff you want. Note that the response is JSON, not HTML, so the JSoup html parser is of not much help here. But there is great JSON libraries available. I use JSON-Simple.

Alternatively, you may switch to Selenium webdriver , which actually remote controls a real browser. This should have no trouble accessing all items from the page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM