简体   繁体   English

带有Ajax / JavaScript的Web爬网程序

[英]Web Crawler with Ajax/JavaScript

I have tried to use HtmlUnit to implement a crawler which can obtain the results generated by executing the Ajax request and javascript's execution.However, HtmlUnit is not so powerful to meet my demand because it can't obtain all the rendered DOM element generated by executing JavaScript or AJax. 我试图使用HtmlUnit来实现一个爬虫,该爬虫可以获取通过执行Ajax请求和javascript的执行生成的结果。但是,HtmlUnit不能满足我的需求,因为它无法获取通过执行生成的所有呈现的DOM元素JavaScript或AJax。 And then I aslo tried to use pywebkitgtk and pyQtwebkit, it did generated some dynamic DOM element.But they don't work stably, and I have no idea to tackle it. 然后我也尝试使用pywebkitgtk和pyQtwebkit,它确实生成了一些动态DOM元素,但是它们工作不稳定,因此我不知道如何解决它。 It seems that someone aslo mentioned using the selenium.Can anybody give me some suggestions to implement a Ajax Crawler? 似乎也有人提到使用硒。有人可以给我一些建议来实现Ajax Crawler吗? Many thanks! 非常感谢!

PhantomJS might be a good solution to your problem. PhantomJS可能是解决您的问题的好方法。 And you can also make use of some crawler api, eg Unicrawler , to simplify this. 而且,您还可以使用某些抓取工具api(例如Unicrawler )来简化此操作。 Hope it works. 希望它能工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM