[英]Java web scraper
What is the best library for a Java web scraper? 什么是Java Web scraper的最佳库? I know the following choices:
我知道以下选择:
I need to select one option to build a scraper for one scalable project. 我需要选择一个选项来为一个可伸缩项目构建一个scraper。
If you are scraping, why do you need a browser? 如果你在抓,你为什么需要浏览器? Just doing basic cURL calls to a page and getting the response will give you what you need to do scraping.
只是对页面进行基本的cURL调用并获得响应将为您提供所需的抓取功能。
This will help with scalability. 这将有助于扩展性。 If you want a browser then go for HTMLUnit as that would again help with scalability.
如果你想要一个浏览器,那就选择HTMLUnit,这样可以再次提高可伸缩性。
我最近推荐了Web Harvest ,并认为它开箱即用,除了围绕HTTP 500响应代码的一些问题...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.