简体   繁体   中英

Stormcrawler's ContentParseFilter

If I set StormCrawler's ContentParseFilter to be

"pattern": "//DIV[@id=\"site-body\"]",

does that mean that that is the ONLY place it will look for links to other pages when processing each url? I am wondering if I set that if it will start ignoring all the urls in the menus and such.

Thanks! Jim

See WIKI page for ParseFilters

The ContentFilter allows to restrict the text of a document to the text covered by a Xpath expression

it does not affect the extraction of links at all but aims at improving the text indexed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM