[英]Getting credential using GoogleAPI - HttpResponseRedirect status_code=302
[英]Avoid getting 403 STATUS_CODE using SCRAPY Python
我正在向该网站 dnb.com 提出请求
我收到403
状态代码。
该网站在浏览器中运行良好,即使使用Requests
库,我也得到200
状态代码,但唯一的scrapy 是给我一个403
状态代码。
想知道是否有人可以帮助我解决这个问题。
非常感谢您的回答...
# -*- coding: utf-8 -*-
import scrapy
from http.cookies import SimpleCookie
def cookie_parser():
cookie_string = 'COOKIES-GOES-HERE'
cookie = SimpleCookie()
cookie.load(cookie_string)
cookies = {}
for key, morsel in cookie.items():
cookies[key] = morsel.value
return cookies
class MainSpider(scrapy.Spider):
name = 'main'
allowed_domains = ['https://www.dnb.com']
# start_urls = ['http://https://www.dnb.com/']
def start_requests(self):
link = 'https://www.dnb.com/business-directory/industry-analysis.agriculture-forestry-sector.html'
yield scrapy.Request(
url=link,
callback=self.parse,
cookies=cookie_parser()
)
def parse(self, response):
print(response.body)
我正在向该网站 dnb.com 提出请求
我收到403
状态代码。
该网站在浏览器中运行良好,即使使用Requests
库,我也得到200
状态代码,但唯一的scrapy 是给我一个403
状态代码。
想知道是否有人可以帮助我解决这个问题。
非常感谢您的回答...
# -*- coding: utf-8 -*-
import scrapy
from http.cookies import SimpleCookie
def cookie_parser():
cookie_string = 'COOKIES-GOES-HERE'
cookie = SimpleCookie()
cookie.load(cookie_string)
cookies = {}
for key, morsel in cookie.items():
cookies[key] = morsel.value
return cookies
class MainSpider(scrapy.Spider):
name = 'main'
allowed_domains = ['https://www.dnb.com']
# start_urls = ['http://https://www.dnb.com/']
def start_requests(self):
link = 'https://www.dnb.com/business-directory/industry-analysis.agriculture-forestry-sector.html'
yield scrapy.Request(
url=link,
callback=self.parse,
cookies=cookie_parser()
)
def parse(self, response):
print(response.body)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.