Stormcrawler's ContentParseFilter

Question

If I set StormCrawler's ContentParseFilter to be

"pattern": "//DIV[@id=\"site-body\"]",

does that mean that that is the ONLY place it will look for links to other pages when processing each url? I am wondering if I set that if it will start ignoring all the urls in the menus and such.

Thanks! Jim

Answer 1

See WIKI page for ParseFilters

The ContentFilter allows to restrict the text of a document to the text covered by a Xpath expression

it does not affect the extraction of links at all but aims at improving the text indexed.

Stormcrawler's ContentParseFilter

Question

1 answers

solution1
0 ACCPTED 2018-09-06 16:27:52

Stormcrawler's ContentParseFilter

Question

1 answers

solution1 0 ACCPTED 2018-09-06 16:27:52

solution1
0 ACCPTED 2018-09-06 16:27:52