简体   繁体   中英

SgmlLinkExtractor and regular expression for match word in a string

I'm using the SgmlLinkExtractor functionality in scrapy to parse specific urls.

I override start_requests function to crawl dynamic url.

this looks like:

start_requests(self): ..... yield Requests(url.strip(), callbackA)

Callback A does nothing right now.

I also implemented process_value for the SgmlLinkExtractor but it never called.

This is the rule I'm using:

rules = [Rule(SgmlLinkExtractor(allow=()), callback=callbackB, follow=True),]

Again callbackB never called.

If your callbacks are declared in your spider, then they will not have global scope and you need to reference them as scoped to your class with self. :

rules = [
  Rule(SgmlLinkExtractor(), callback=self.callbackB, follow=True),
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM