简体   繁体   中英

Ignoring spam/ads from a url using jsoup

I am using jsoup parser for loading the contents of some sites. Generally some sites have advertisements and other non relevant stuff on the pages. Is it possible to ignore these when parsing a url?

No, there isn't a advertisement link avoiding function built in in Jsoup. You have to do it manually (by inspecting ad urls of each page and matching them, with regex for example).

This is not a direct answer to your question but you could use AlchemyAPI for that. They have a free 1,000 API calls program (and 30,000 if that's for academic purposes):

http://www.alchemyapi.com/api/text/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM