Get html document programmaticaly simulating a web browser

Question

The thing is that I'm trying to get an html document with Jsoup class and I realize that the doc I get using Jsoup.connect is not exactly similar to the doc I get if I directly download it with a web browser.

Example:
I want to monitor prices of an article. I get the html documents of "Icecat" using:

Jsoup.connect( "http://icecat.es/es/p/sony/mdr-as200-blk/auriculares-0027242861022-Sony-MDR-AS200-18145805.html?ti=offers")
     .userAgent(userAgentString).timeout(5000)   
     .followRedirects(true).execute();

( userAgentString : I tried with different ones)

But the document I get doesn't have the pricing information, the tab with the info appears "inactive".
Ohterwise, if I try to download it using any web browser, the page directly shows the prices table.

Bonus question

I get the same behaviour trying to get google's result page. Typing directly in the web browser https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#tbm=shop&q=Sony+MDR-AS200 is ok, but getting it with java I'm redirected to google's home page. I know google's TOS, but I don't want to do a massive parsing.

Answer 1

Jsoup does not execute JavaScript. If the site you try to get uses some AJAX calls to and creates (part of) the DOM dynamically you are out of luck with Jsoup.

You may use selenium webdriver for that, or try to find the AJAX calls and trigger them directly.

Get html document programmaticaly simulating a web browser

Question

1 answers

solution1
3 ACCPTED 2015-12-20 10:53:49

Get html document programmaticaly simulating a web browser

Question

1 answers

solution1 3 ACCPTED 2015-12-20 10:53:49

solution1
3 ACCPTED 2015-12-20 10:53:49