[英]Unable to fetch data from all pages in scrapy
I am unable to fetch all pages using below code it only gives data upto page 90 and then show arribute error.我无法使用以下代码获取所有页面,它只提供第 90 页的数据,然后显示参数错误。 I am using next button url to move to the next page.
我正在使用下一个按钮 url 移动到下一页。 But after page 90 it is giving error that i have mentioned below.
但是在第 90 页之后,它给出了我在下面提到的错误。
Running this code:运行此代码:
import scrapy
import re
class PaginationSpider(scrapy.Spider):
name = 'pagination'
allowed_domains = ['www.farfetch.com']
start_urls = ['https://www.farfetch.com/de/shopping/men/shoes-2/items.aspx?page=1']
total_pages_pattern = r'"totalPages":(\d+)'
current_page_pattern = r"page=(\d+)"
def parse(self, response):
number_of_pages= int(re.search(self.total_pages_pattern, str(response.body)).group(1))
current_page = int(re.search(self.current_page_pattern, response.url).group(1))
for brand in response.xpath("//h3[@itemprop='brand']//text()"):
yield {
"brand":brand.get()
}
if current_page <= number_of_pages:
next_page = "https://www.farfetch.com/de/shopping/men/shoes-2/items.aspx?page=" + str(current_page+1)
print("Current_page:" + str(current_page))
yield response.follow(url=next_page, callback=self.parse)
Error :错误 :
current_page = int(re.search(self.current_page_pattern, response.url).group(1))
re.search()
method will return a Re object if the pattern matches the string.如果模式匹配字符串,
re.search()
方法将返回一个 Re 对象。 If there is no match, it will return None
.如果没有匹配项,它将返回
None
。 So, when the pattern doesn't match, you are calling .group(1)
in None
.因此,当模式不匹配时,您将在
None
中调用.group(1)
。
That's why you are getting an AttributeError
.这就是您收到
AttributeError
的原因。
I didn't execute you code, but you can probably solve it by adding a if statement.我没有执行你的代码,但你可以通过添加 if 语句来解决它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.