简体   繁体   中英

Scrapy: missing cookies in response

I've created the basic scrapy project and enabled cookiemiddleware as in documentation .

settings.py

COOKIES_ENABLED = True
COOKIES_DEBUG = True

DOWNLOADER_MIDDLEWARES = {
  'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700  
}

spiders/amazon_spider.py

class AmazonSpider(Spider):
  name = "amazon_spider"
  start_urls = ['https://sellercentral.amazon.com/gp/sc-redirect']

  def parse(self, response):
      self.logger.info(response.headers.getlist('Set-Cookie'))

Though, for request with

COOKIES_ENABLED = True

the response is the same as for the request with

COOKIES_ENABLED = False

having

Please Enable Cookies to Continue

in it's body.

  1. Using Firefox & Firebug

REQUEST
GET /gp/sc-redirect HTTP/1.1
Host: sellercentral.amazon.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9, / ;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
RESPONSE
HTTP/1.1 302 Found
Server: Server
Date: Mon, 30 Jan 2017 16:12:51 GMT
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Location: https:// sellercentral.amazon.com/ap/signin?...
Vary: Accept-Encoding,User-Agent
Cache-Control: no-cache, no-store, must-revalidate
Expires: 0
Pragma: no-cache
Content-Encoding: gzip
Set-Cookie: session-id-time=1486368000l; path=/; domain=.amazon.com; expires=Mon, 06-Feb-2017 16:12:51 GMT session-id=160-1127516-9252943; path=/; domain=.amazon.com; expires=Mon, 06-Feb-2017 16:12:51 GMT

  1. Using Scrapy

RESPONSE
DEBUG: Crawled (200) https://sellercentral.amazon.com/robots.txt> (referer: None)
DEBUG: Redirecting (302) to https://sellercentral.amazon.com/ap/signin?...> from https://sellercentral.amazon.com/gp/sc-redirect/>
DEBUG: Received cookies from: <302 https:// sellercentral.amazon.com/ap/signin?...> Set-Cookie: signin-sso-state-us=44538bf3-88d0-410b-9aa0-bc8da4b2d090; Domain=.amazon.com; Expires=Sun, 25-Jan-2037 16:09:14 GMT; Path=/ap/; Secure; HttpOnly
Set-Cookie: ap-fid=""; Domain=.amazon.com; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/ap/; Secure

Here are the full log and full final response body .

Why the Set-Cookie results are different and how to handle cookies with Scrapy for this particular case?

After adding

USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'

to the settings.py it worked fine and did not complain about disabled cookies.

The new response is

DEBUG: Crawled (200) https://sellercentral.amazon.com/robots.txt> (referer: None)
Received cookies from: <302 https: //sellercentral.amazon.com/gp/sc-redirect/>
Set-Cookie: session-id-time=1486368000l; path=/; domain=.amazon.com; expires=Mon, 06-Feb-2017 23:14:58 GMT
Set-Cookie: session-id=167-3010519-3678460; path=/; domain=.amazon.com; expires=Mon, 06-Feb-2017 23:14:58 GMT
DEBUG: Redirecting (302) to https: //sellercentral.amazon.com/ap/signin?...> from https: //sellercentral.amazon.com/gp/sc-redirect/>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM