简体   繁体   中英

Scrapy: how to pass links

I can not pass references. When starting a spider, I'm not getting data Help with code.

I'm a beginner in Scrapy

import scrapy
from movie.items import AfishaCinema

class AfishaCinemaSpider(scrapy.Spider):
    name = 'afisha-cinema'
    allowed_domains = ['kinopoisk.ru']
    start_urls = ['https://www.kinopoisk.ru/premiere/ru/']

    def parse(self, response):
    links = response.css('div.textBlock>span.name_big>a').xpath(
        '@href').extract()
    for link in links:
        yield scrapy.Request(link, callback=self.parse_moov,
                             dont_filter=True)

def parse_moov(self, response):
    item = AfishaCinema()
    item['name'] = response.css('h1.moviename-big::text').extract()

The reason you are not getting the data is that you don't yield any from your parse_moov method. As per the documentation , parse method must return an iterable of Request and/or dicts or Item objects . So add

yield item

at the end of your parse_moov method.

Also, to be able to run your code, I had to modify

yield scrapy.Request(link, callback=self.parse_moov, dont_filter=True)

to

yield scrapy.Request(response.urljoin(link), callback=self.parse_moov, dont_filter=True)

in the parse method, otherwise I was getting errors:

ValueError: Missing scheme in request url: /film/monstry-na-kanikulakh-3-more-zovyot-2018-950968/

(That's because Request constructor needs absolute URL while the page contains relative URLs.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM