简体   繁体   English

您如何让scrapy产生所有物品?

[英]How do you let scrapy yield all items?

I just learned scrapy recently. 我最近刚学会刮y。 How do you let scrapy yield all items? 您如何让scrapy产生所有物品?

For example, if I want to extract a book . 例如,如果我想摘一book And the home page is the book title , layer two is the chapter , layer three is the article . 主页是book title ,第二层是chapterchapter层是article

class BookSpider(scrapy.spider.Spider):
   name = 'book'
   allowed_domains = ['book.com']
   start_urls = ['http://www.book.com']

   def __init__(self):
      self.items = []

   def parse(self, response):
      link = response.xpath('//chapter').extract()

      for l in links:
         yield Request(l, callback=self.parse_chapter)

      print self.items

   def parse_chapter(self, response):
      link = response.xpath('//article').extract()

      for l in links:
         yield Request(l, callback=self.parse_article)
      return

   def parse_article(self, response):
      item = BookItem()
      item['article'] = response.url
      self.items.append(item)
      return

But the result is just an empty list. 但是结果只是一个空列表。 Why is self.items not able to be built? 为什么无法建立self.items

You need to return or yield item or a list of items from any of the callbacks: 您需要从任何回调中返回或产生项目或项目列表:

def parse_article(self, response):
    item = BookItem()
    item['article'] = response.url
    return item

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM