I am trying to get HTML's body content but it returns me an empty body only to this specific site, what can it be?
Document doc = Jsoup
.connect("http://givatram.ort.org.il/%D7%9C%D7%95%D7%97-%D7%A9%D7%99%D7%A0%D7%95%D7%99%D7%99-%D7%9E%D7%A2%D7%A8%D7%9B%D7%AA/")
.userAgent(
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36")
.timeout(0).followRedirects(true).execute().parse();
Elements titles = doc.select(".entrytitle");
System.out.println(doc.body());
I could reproduce your problem. If I check the entire document with System.out.println(doc)
then I can see that there is a script in the head tag, which indicates that it does connect to the site. According to this answer Jsoup is only a pure HTML parser and doesn't execute Javascript. Maybe the content of the site is loaded via Javascript and that is why the body is empty?
Edit 1:
I could verify this. If I use ui4j , a small wrapper for the JavaFx Browser, I can see the body:
BrowserEngine browser = BrowserFactory.getWebKit();
Page page = browser.navigate("http://givatram.ort.org.il/%D7%9C%D7%95%D7%97-%D7%A9%D7%99%D7%A0%D7%95%D7%99%D7%99-%D7%9E%D7%A2%D7%A8%D7%9B%D7%AA/");
System.out.println(page.getDocument().getBody());
So it seems like what you are trying to do is unfortunately not possible with JSoup.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.