简体   繁体   中英

Intelligent web scraping c#

Theres a number of products out there that provide a gui to pick out the tags you want to scrape from a web page. (Things like WebHarvy for example)

I've seen the HTML Agility Pack before for getting at the DOM. I just wanted to check if anyone knows of any nice libraries or processes for automatically finding the useful content within a HTML page and creating the XPath required.

Similar to how Evernote and iOS know where the "Article" is on a page. However ideally working for repeating regions and pagination.

Not sure if this is what you are looking for:
http://www.diffbot.com/

But Diffbot is good in scraping content from websites.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM