简体   繁体   中英

Scrapy: set cookies for a response (no from request)

I need to extract some wages in USD currency, but I am accessing the page from another country, then, the currency shown is the local (riel) one and no USD. So, I am sending cookies to request a new currency and a new country

In Settings I have:

COOKIES_ENABLED = False
COOKIES_DEBUG = True

In the Spider I use:

class HtSpider(scrapy.Spider):
    name = 'sells'
    allow_domain = ['hattrick.org']

    def start_requests(self):
        urls = ['https://www.hattrick.org']
        for url in urls:
            player = 'goto.ashx?path=/Club/Players/Player.aspx?playerId=450940600'
            joint = urljoin(url, player)
            yield scrapy.Request(
                url=joint,
                cookies={'currency': 'USD', 'country': 'US'},
                # meta={'dont_merge_cookies': True},
                dont_filter=True,callback=self.price)
    def price(self,response):
       price_xpath = response.xpath('//* [@id="transferHistory"]/table//tr[1]/td[6]/text()').extract_first()
       print(price_xpath) // it is not in USD but in Riel :(
       open_in_browser(response) // to check if it is in Riel or in USD

Then, from the cookies debug I obtain:

DEBUG: Sending cookies to: <GET https://www.hattrick.org/en/Club/Players/Player.aspx?playerId=450940600> 
Cookie: currency=USD; country=US; currency=USD; country=US; ASP.NET_SessionId=xxxxx
2021-01-05 16:33:13 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <200 https://www.hattrick.org/en/Club/Players/Player.aspx?playerId=450940600>
Set-Cookie: InitialOrigin=Origin=direct|&DateSet=2021-01-05 10:33:13;

**Print price: 2 280 000 Riel**

How to get the cookies I send in the request instead of the ones from the website? In short... how to get USD and not Riels?

First, have you tested with Postman to make sure that it actually works with this cookie?

If you have COOKIES_ENABLED = False then scrapy will not send your cookies to the target server. Since you're only sending the one request to the server the cookies from the server will not be considered. So setting COOKIES_ENABLED = True should solve it.

However, if you need to send multiple requests to the server, then this might not work since the set_cookies headers from the server might override your cookie.

To solve this I would set COOKIES_ENABLED = False . Then send request like this:

yield scrapy.Request(
    url=joint,
    headers={
         'cookies': 'currency:USD;country:US'
    }
    dont_filter=True,callback=self.price)

I'm, using headers instead of cookies because if you have disabled cookies in settings, then cookies field would be considered.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM