简体   繁体   中英

How to configure heritrix to log all encountered URLs including those which are filtered / not to crawl?

I'm using heritrix 3.1.1-snapshot to crawl / archive some website contents, I need to log all urls encountered in every page it processes, including those urls which are (configured) not to be crawled.

I've been searching for long time and havent gotten positive results :( hope can get some helps here. thanks.

http://crawler.archive.org/articles/user_manual/config.html 6.3.1.4节似乎回答了您的问题。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM