简体   繁体   中英

Can't start tutorial scrapy smoothly

I had started scrapy with Official Tutorial, but I can't go with it successfully.My code is totally same with official one.

import scrapy
class QuotesSpider(scrapy.Spider):
    name = 'Quotes';

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
        ]
        for url in urls:
            yield scrapy.Request(url=url,callback = self.parse);

    def parse(self, response):
        page = response.url.split('/')[-2];
        print('--------------------------------->>>>');
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
                'tags': quote.css('div.tags a.tag::text').getall(),
            }

When i execute it on CMD with instruction (scrapy crawl Quotes) and the result like this:

2020-12-20 10:00:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2020-12-20 10:00:26 [scrapy.core.scraper] ERROR: Spider error processing <GET http://quotes.toscrape.com/page/1/> (referer: None)
Traceback (most recent call last):
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: <200 http://quotes.toscrape.com/page/1/>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
    result = f(*args, **kw)
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\core\spidermw.py", line 58, in process_spider_input
    return scrape_func(response, request, spider)
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\core\scraper.py", line 149, in call_spider
    warn_on_generator_with_return_value(spider, callback)
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\misc.py", line 245, in warn_on_generator_with_return_value
    if is_generator_with_return_value(callable):
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\misc.py", line 230, in is_generator_with_return_value
    tree = ast.parse(dedent(inspect.getsource(callable)))
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    def parse(self, response):
    ^
IndentationError: unexpected indent
2020-12-20 10:00:26 [scrapy.core.engine] INFO: Closing spider (finished)
2020-12-20 10:00:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

I check it many times but I still do not know how to deal with it!

There is a IndentationError .Need to fix code indentation. Its work fine.

You might find a solution for your issue here

Scrapy installed, but won't run from the command line

It is not about the yield, I think either all the semicolons or maybe the last comma after getall()

'tags': quote.css('div.tags a.tag::text').getall(),

might cause the interpreter to expect sth. else.

Remove the semicolons and the last comma - does it still not work?

The error output shows the indentation error at:

def parse
^

this tells you, that something before there caused it, so I guess it should be the first semicolon.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM