简体   繁体   中英

Passing cookie's with redirects using request in node.js

I am trying to scrape a web site using node.js and request. It was working fine and then all of a sudden today I started getting errors about exceeding the maximum number of redirects. I promptly pulled up developer tools and hit the page and saw that it did a couple redirects but then gave me the response. When running in node.js obviously it did not do that. Here is the page I am hitting to scrape:

https://live-tennis.eu/en/atp-live-ranking

If you hit it in a browser you will see that it does one redirect adding a querystring parameter __r and then that takes that and puts it in set-cookie and redirects back to the original URL and the response is returned. However, when I run that in node.js it doesn't stop there and it continues to redirect until it hits the max (I believe the default is 10) and then errors. So I started adding every header that I could that was in the request I saw in developer tools in my request options and when I added the cookies all of a sudden it worked. So I googled, "how to keep cookies on redirect using request in node.js" as stumbled across some posts that implied that I should specify "jar: true" in my options which would tell request to put cookies in its internal cookie jar and pass them through. I did that and it worked. So I stripped all of my other options back out and went back to what I started with adding the jar option like this:

    var options = {
        url: 'https://live-tennis.eu/en/atp-live-ranking',
        port: 443,
        proxy: process.env.HTTPS_PROXY,
        jar: true,
        headers: {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko',
            'Accept-Language': 'en-us',
            'Content-Language': 'en-us'
        },
        timeout: 0,
        encoding: null,
        rejectUnauthorized: false
    };

and the call is just a plain old request call like this:

        request(options, function (err, resp, body) {
            if (err) reject(err);
            else resolve(body);
        });

I ran it locally and everything worked so I published to Azure and it still doesn't work. When I look at Application Insights on Azure I can see it still had a chain of http requests with each returning a 307 redirect until it hit the max and gave an error. Now for the really odd part. Since I could not get it to work on Azure I went back to my local version and put it back exactly as it was before without the "jar: true" and it still works. I even cleared cookies and cache in Chrome just to make sure that didn't have something to do with it. Now I can't get it to fail locally again (which I was honestly just doing so I could paste the error and stack trace) but it will not run correctly on Azure.

Given that the only way I got it to work locally was by setting the cookie manually in the header in the request (which I did by simply adding 'cookie' in the headers and pasting in the value from dev tools) that had to be the reason, but why does it still work after I have taken that out and gotten rid of the jar: true, and more importantly why can I not get it to work on Azure at all?

Thanks in advance for any help Chris

Thanks to @jfriend00 for pointing me in the right direction. My issue was an inadvertently published environment file that was causing the production deployment to try to use HTTPS_PROXY that I definitely did not want on Azure. That problem is now solved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM