简体   繁体   中英

Java web scraper

What is the best library for a Java web scraper? I know the following choices:

  1. Selenium
  2. HTMLUnit
  3. Lobo browser

I need to select one option to build a scraper for one scalable project.

If you are scraping, why do you need a browser? Just doing basic cURL calls to a page and getting the response will give you what you need to do scraping.

This will help with scalability. If you want a browser then go for HTMLUnit as that would again help with scalability.

我最近推荐了Web Harvest ,并认为它开箱即用,除了围绕HTTP 500响应代码的一些问题...

Use jsoup , it works great to get the response from URL and then use the XPath Expression to parse data from the response. I've implemented this and it works great.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM