[英]How do you let scrapy yield all items?
I just learned scrapy recently. 我最近刚学会刮y。 How do you let scrapy yield all items?
您如何让scrapy产生所有物品?
For example, if I want to extract a book
. 例如,如果我想摘一
book
。 And the home page is the book title
, layer two is the chapter
, layer three is the article
. 主页是
book title
,第二层是chapter
, chapter
层是article
。
class BookSpider(scrapy.spider.Spider):
name = 'book'
allowed_domains = ['book.com']
start_urls = ['http://www.book.com']
def __init__(self):
self.items = []
def parse(self, response):
link = response.xpath('//chapter').extract()
for l in links:
yield Request(l, callback=self.parse_chapter)
print self.items
def parse_chapter(self, response):
link = response.xpath('//article').extract()
for l in links:
yield Request(l, callback=self.parse_article)
return
def parse_article(self, response):
item = BookItem()
item['article'] = response.url
self.items.append(item)
return
But the result is just an empty list. 但是结果只是一个空列表。 Why is
self.items
not able to be built? 为什么无法建立
self.items
?
You need to return or yield item or a list of items from any of the callbacks: 您需要从任何回调中返回或产生项目或项目列表:
def parse_article(self, response):
item = BookItem()
item['article'] = response.url
return item
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.