Scrapy execution flow

Question

i am trying to understand Scrapy execution but getting confused because of the generators used in between.i have little idea on generators but i am not able to visualize/correlate those things in here

below is the code from scrapy documentation

questions

1) How yield works here

2)I see two for loops in parse function ,1st for loop is calling parse_author function in the yield but is getting called only after for loop1(executing twice) and loop2(executing once).can some one please explain how the execution flow is happening here.

import scrapy
from datetime import datetime, timedelta
name = 'prox-reveal'
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
    # follow links to author pages
    for href in response.css('.author + a::attr(href)'):
        print('1---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
        yield response.follow(href, self.parse_author)

    # follow pagination links
    for href in response.css('li.next a::attr(href)'):
        print('2---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
        yield response.follow(href, self.parse)

def parse_author(self, response):
    print('3---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
    def extract_with_css(query):
        return response.css(query).extract_first().strip()

    yield {
        'name': extract_with_css('h3.author-title::text'),
        'birthdate': extract_with_css('.author-born-date::text'),
        'bio': extract_with_css('.author-description::text'),
    }

thanks

Answer 1

A simplified overview of the relation between a request and its callback:

A Request object is created and passed to Scrapy's engine for further processing
```
 yield response.follow(href, self.parse_author) 
```
The requested webpage is downloaded and a Response object is created
The request's callback ( parse_author() ) is called with the created response

Now comes the part I believe is causing you trouble.

Scrapy is an asynchronous framework, it can do other things while waiting for I/O operations (such as downloading a webpage) to complete.

So your loop is continued, other requests are created and processed, and the callback will be called - once the data for it is available.

Scrapy execution flow

Question

1 answers

solution1
0 ACCPTED 2018-04-15 09:11:41

Scrapy execution flow

Question

1 answers

solution1 0 ACCPTED 2018-04-15 09:11:41

solution1
0 ACCPTED 2018-04-15 09:11:41