简体   繁体   English

使Jsoup支持JavaScript动态生成的html

[英]Getting Jsoup to support dynamically generated html by JavaScript

right now I'm working on a webcrawler. 现在,我正在研究网络爬虫。 This one should parse some specific sites and give me an output into an xml-file. 这个应该解析一些特定的站点,并给我输出到xml文件中。 Up to this point, it's no problem. 至此,这没有问题。 The Crawler works and you can customize it realy quickly via a cfg-file. 搜寻器有效,您可以通过cfg文件快速真正地对其进行自定义。 I use Jsoup to parse the HTML-content. 我使用Jsoup解析HTML内容。

I just added a few more sites and noticed that I got a huge problem with HTML-content that is created via JavaScript. 我刚刚添加了几个站点,并注意到通过JavaScript创建的HTML内容存在很大的问题。 Isn't there a way to make Jsoup supporting Javascript? 没有办法使Jsoup支持Javascript吗? Or at least get the full HTML-content I can see in my browser. 或至少获得我可以在浏览器中看到的完整HTML内容。

I already tried HtmlUnit, but this one didn't do well. 我已经尝试过HtmlUnit,但是这个做得不好。 It did not give me the content I would get in my browser. 它没有给我我在浏览器中获得的内容。

Sincerly, 真诚的

Ogofo 奥戈佛

Jsoup does not support javascript and it does not emulate a browser. Jsoup不支持javascript,并且不模拟浏览器。 Just forget about it if you're planning to execute Javascript. 如果您打算执行Javascript,就不用管它了。 In my experience HtmlUnit, which is a headless browser, has given me the best results (always talking about Java frameworks). 以我的经验,HtmlUnit是一个无头的浏览器,它给了我最好的结果(总是谈论Java框架)。

One thing that worths trying in HtmlUnit is changing the BrowserVersion (Chrome / InternetEplorer / FireFox) while creating the WebClient instance. 在HtmlUnit中BrowserVersion尝试的一件事是在创建WebClient实例时更改BrowserVersion (Chrome / InternetEplorer / FireFox)。 Some sites react in a different way and sometimes just changing that value might give you the results you expect to get. 有些网站的反应方式不同, 有时只是更改该值可能会给您带来预期的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM