草率的执行流程

Question

我试图了解Scrapy的执行方式，但由于两者之间使用了生成器而感到困惑。我对生成器一无所知，但我无法在此处可视化/关联这些内容

以下是scrapy文档中的代码

问题

1）收益如何在这里工作

2）我在解析函数中看到两个for循环，第一个for循环正在yield中调用parse_author函数，但仅在for loop1（执行两次）和loop2（执行一次）之后才被调用。请解释一下执行流程发生在这里。

import scrapy
from datetime import datetime, timedelta
name = 'prox-reveal'
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
    # follow links to author pages
    for href in response.css('.author + a::attr(href)'):
        print('1---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
        yield response.follow(href, self.parse_author)

    # follow pagination links
    for href in response.css('li.next a::attr(href)'):
        print('2---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
        yield response.follow(href, self.parse)

def parse_author(self, response):
    print('3---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
    def extract_with_css(query):
        return response.css(query).extract_first().strip()

    yield {
        'name': extract_with_css('h3.author-title::text'),
        'birthdate': extract_with_css('.author-born-date::text'),
        'bio': extract_with_css('.author-description::text'),
    }

谢谢

Answer 1

请求与其回调之间关系的简化概述：

创建一个Request对象，并将其传递给Scrapy的引擎进行进一步处理
```
 yield response.follow(href, self.parse_author) 
```
下载请求的网页并创建一个Response对象
使用创建的响应调用请求的回调（ parse_author() ）

现在是我认为正在给您造成麻烦的部分。

Scrapy是一个异步框架，它可以在等待I / O操作（例如下载网页）完成时做其他事情。

因此，您的循环将继续，其他请求将被创建和处理，并且一旦有可用的数据，就会调用该回调。

草率的执行流程

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-04-15 09:11:41

草率的执行流程

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-04-15 09:11:41

解决方案1
0 已采纳 2018-04-15 09:11:41