简体   繁体   中英

jsoup - Not able to fetch a specific website

I'm using latest jsoup (1.13.1) in latest Eclipse IDE for Java Developers (includes Incubating components) Version: 2020-09 (4.17.0) Build id: 20200910-1200.

I'm trying to parse a very specific website, but with no success. After I execute these lines: doc = Jsoup.connect("http://pokehb.pw/%D7%A2%D7%95%D7%A0%D7%94/21/%D7%A4%D7%A8%D7%A7/43").get(); doc.select("title").forEach(System.out::println);

Nothing gets printed. It's not just the , any element or property of the page is not available.

Yes, the URL is weird, but this is the one I need, I can browse it fine in Chrome. I also know this is now due to the Hebrew in the website, since other Hebrew sites works ok.

For example, using this URL seems fine: https://context.reverso.net/translation/hebrew-english/%D7%9C%D7%9B%D7%AA%D7%95%D7%91%D7%AA+url

Any hint on what can be done?

What I can tell you is there's a "laravel_session" in the cookies. This suggests you'll need a more capable technology than JSoup. Try HtmlUnit instead and it might work.

What I ended up doing is using this command: doc = Jsoup.parse(driver.getPageSource());

Which brought all of the page's source into the doc. From there it was a simple use of getElementsByClass and getElementsByTag.

Hope this helps someone, and thanks Rob for trying to answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM