I'm using the SgmlLinkExtractor functionality in scrapy to parse specific urls.
I override start_requests function to crawl dynamic url.
start_requests(self): ..... yield Requests(url.strip(), callbackA)
Callback A does nothing right now.
I also implemented process_value for the SgmlLinkExtractor but it never called.
rules = [Rule(SgmlLinkExtractor(allow=()), callback=callbackB, follow=True),]
Again callbackB never called.
If your callbacks are declared in your spider, then they will not have global scope and you need to reference them as scoped to your class with self.
:
rules = [
Rule(SgmlLinkExtractor(), callback=self.callbackB, follow=True),
]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.