简体   繁体   English

无法从scrapy中的所有页面获取数据

[英]Unable to fetch data from all pages in scrapy

I am unable to fetch all pages using below code it only gives data upto page 90 and then show arribute error.我无法使用以下代码获取所有页面,它只提供第 90 页的数据,然后显示参数错误。 I am using next button url to move to the next page.我正在使用下一个按钮 url 移动到下一页。 But after page 90 it is giving error that i have mentioned below.但是在第 90 页之后,它给出了我在下面提到的错误。

Running this code:运行此代码:

import scrapy
import re

class PaginationSpider(scrapy.Spider):
    name = 'pagination'
    allowed_domains = ['www.farfetch.com']
    start_urls = ['https://www.farfetch.com/de/shopping/men/shoes-2/items.aspx?page=1']

    total_pages_pattern = r'"totalPages":(\d+)'
    current_page_pattern = r"page=(\d+)"

    def parse(self, response):
        
        number_of_pages= int(re.search(self.total_pages_pattern, str(response.body)).group(1))
        current_page = int(re.search(self.current_page_pattern, response.url).group(1))
        
        for brand in response.xpath("//h3[@itemprop='brand']//text()"):

            yield {
                "brand":brand.get()
            }

        if current_page <= number_of_pages:

            next_page = "https://www.farfetch.com/de/shopping/men/shoes-2/items.aspx?page=" + str(current_page+1)
            
            print("Current_page:" + str(current_page))

            yield response.follow(url=next_page, callback=self.parse)

Error :错误 :错误图片

    current_page = int(re.search(self.current_page_pattern, response.url).group(1))

re.search() method will return a Re object if the pattern matches the string.如果模式匹配字符串, re.search()方法将返回一个 Re 对象。 If there is no match, it will return None .如果没有匹配项,它将返回None So, when the pattern doesn't match, you are calling .group(1) in None .因此,当模式不匹配时,您将在None中调用.group(1)

That's why you are getting an AttributeError .这就是您收到AttributeError的原因。

I didn't execute you code, but you can probably solve it by adding a if statement.我没有执行你的代码,但你可以通过添加 if 语句来解决它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM