Scrapy：响应中缺少cookie

Question

I've created the basic scrapy project and enabled cookiemiddleware as in documentation . 我已经创建了基本的scrapy项目并按照文档中的说明启用cookiemiddleware。

settings.py settings.py

COOKIES_ENABLED = True
COOKIES_DEBUG = True

DOWNLOADER_MIDDLEWARES = {
  'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700  
}

spiders/amazon_spider.py 蜘蛛/ amazon_spider.py

class AmazonSpider(Spider):
  name = "amazon_spider"
  start_urls = ['https://sellercentral.amazon.com/gp/sc-redirect']

  def parse(self, response):
      self.logger.info(response.headers.getlist('Set-Cookie'))

Though, for request with 虽然，要求

COOKIES_ENABLED = True

the response is the same as for the request with 响应与请求相同

COOKIES_ENABLED = False

having 有

Please Enable Cookies to Continue 请启用Cookies以继续

in it's body. 在它的身上。

Using Firefox & Firebug 使用Firefox和Firebug

REQUEST 请求
GET /gp/sc-redirect HTTP/1.1 GET / gp / sc-redirect HTTP / 1.1
Host: sellercentral.amazon.com 主持人：sellercentral.amazon.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0 用户代理：Mozilla / 5.0（X11; Linux x86_64; rv：50.0）Gecko / 20100101 Firefox / 50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9, / ;q=0.8 接受：text / html，application / xhtml + xml，application / xml; q = 0.9， / ; q = 0.8
Accept-Language: en-US,en;q=0.5 接受语言：en-US，en; q = 0.5
Accept-Encoding: gzip, deflate, br 接受编码：gzip，deflate，br
DNT: 1 DNT：1
Connection: keep-alive 连接：保持活动状态
Upgrade-Insecure-Requests: 1 不安全升级请求：1
RESPONSE 响应
HTTP/1.1 302 Found 找到HTTP / 1.1 302
Server: Server 服务器：服务器
Date: Mon, 30 Jan 2017 16:12:51 GMT 日期：2017年1月30日星期一16:12:51 GMT
Content-Type: text/html;charset=UTF-8 内容类型：text / html; charset = UTF-8
Transfer-Encoding: chunked 传输编码：分块
Connection: keep-alive 连接：保持活动状态
Location: https:// sellercentral.amazon.com/ap/signin?... 位置：https：// Sellercentral.amazon.com/ap/signin？...
Vary: Accept-Encoding,User-Agent 变化：接受编码，用户代理
Cache-Control: no-cache, no-store, must-revalidate 缓存控制：无缓存，无存储，必须重新验证
Expires: 0 过期：0
Pragma: no-cache 语法：无缓存
Content-Encoding: gzip 内容编码：gzip
Set-Cookie: session-id-time=1486368000l; Set-Cookie：session-id-time = 1486368000l; path=/; 路径= /; domain=.amazon.com; 域= .amazon.com; expires=Mon, 06-Feb-2017 16:12:51 GMT session-id=160-1127516-9252943; expires =星期一，2017年2月6日16:12:51 GMT session-id = 160-1127516-9252943; path=/; 路径= /; domain=.amazon.com; 域= .amazon.com; expires=Mon, 06-Feb-2017 16:12:51 GMT expires =星期一，2017年2月6日16:12:51 GMT

Using Scrapy 使用Scrapy

RESPONSE 响应
DEBUG: Crawled (200) https://sellercentral.amazon.com/robots.txt> (referer: None) 调试：已抓取（200）https://sellercentral.amazon.com/robots.txt>（引荐网址：无）
DEBUG: Redirecting (302) to https://sellercentral.amazon.com/ap/signin?...> from https://sellercentral.amazon.com/gp/sc-redirect/> 调试：将（302）从https://sellercentral.amazon.com/gp/sc-redirect/>重定向到https：//sellercentral.amazon.com/ap/signin？...>
DEBUG: Received cookies from: <302 https:// sellercentral.amazon.com/ap/signin?...> Set-Cookie: signin-sso-state-us=44538bf3-88d0-410b-9aa0-bc8da4b2d090; 调试：收到来自以下网站的Cookie：<302 https：// Sellercentral.amazon.com/ap/signin？...> Set-Cookie：signin-sso-state-us = 44538bf3-88d0-410b-9aa0-bc8da4b2d090; Domain=.amazon.com; 域= .amazon.com; Expires=Sun, 25-Jan-2037 16:09:14 GMT; Expires = Sun，2037年1月25日16:09:14 GMT; Path=/ap/; 路径= / AP /; Secure; 安全; HttpOnly 仅Http
Set-Cookie: ap-fid=""; Set-Cookie：ap-fid =“”; Domain=.amazon.com; 域= .amazon.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Expires = Thu，1970年1月1日格林尼治标准时间； Path=/ap/; 路径= / AP /; Secure 安全

Here are the full log and full final response body . 这是完整的日志和完整的最终响应正文。

Why the Set-Cookie results are different and how to handle cookies with Scrapy for this particular case? 为什么Set-Cookie的结果有所不同？对于这种特殊情况，如何使用Scrapy处理Cookie？

Answer 1

After adding 添加后

USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'

to the settings.py it worked fine and did not complain about disabled cookies. 到settings.py ，效果很好，并且没有抱怨禁用的cookie。

The new response is 新的回应是

DEBUG: Crawled (200) https://sellercentral.amazon.com/robots.txt> (referer: None) 调试：已抓取（200）https://sellercentral.amazon.com/robots.txt>（引荐网址：无）
Received cookies from: <302 https: //sellercentral.amazon.com/gp/sc-redirect/> 收到的cookie来自：<302 https：//sellercentral.amazon.com/gp/sc-redirect/>
Set-Cookie: session-id-time=1486368000l; Set-Cookie：session-id-time = 1486368000l; path=/; 路径= /; domain=.amazon.com; 域= .amazon.com; expires=Mon, 06-Feb-2017 23:14:58 GMT expires =星期一，2017年2月6日23:14:58 GMT
Set-Cookie: session-id=167-3010519-3678460; Set-Cookie：session-id = 167-3010519-3678460； path=/; 路径= /; domain=.amazon.com; 域= .amazon.com; expires=Mon, 06-Feb-2017 23:14:58 GMT expires =星期一，2017年2月6日23:14:58 GMT
DEBUG: Redirecting (302) to https: //sellercentral.amazon.com/ap/signin?...> from https: //sellercentral.amazon.com/gp/sc-redirect/> 调试：从（https）// sellercentral.amazon.com/gp/sc-redirect/>将（302）重定向到https：/sellercentral.amazon.com/ap/signin？...>

Scrapy：响应中缺少cookie

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-01-30 23:33:04

Scrapy：响应中缺少cookie

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-01-30 23:33:04

解决方案1
2 已采纳 2017-01-30 23:33:04