简体   繁体   中英

Python Request How to retrive cookie for get request?

I am trying to fetch some details from a website using requests module and I realize that I can't do without setting the cookie in a header. However, I am not able to understand how to retrieve this cookie.

If I copy the cookie using chrome developers tools and set as part of request it works, however after some time it expires and then I have to do copy paste again, is there a way I can do auto retrieval or renewal?

Code:

headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'cookie': 'visid_incap_820541=xigWzrvDQcSUJ0mvESKe+BR9KlwAAAAAQUIPAAAAAABRN2d88YW7aPzz88KJGqf2; optimizelyEndUserId=oeu1546288405916r0.23219734574282036; _gcl_au=1.1.1732703525.1546288407; _ga=GA1.2.125112106.1546288407; pCode=L7R 0B4; PageSize=15; AAMC_traderca_0=REGION%7C7; aam_uuid=61828363884759157680150590708572742734; .ASPXANONYMOUS=_6qGJdrX1AEkAAAANDYzYmFjYjMtOTVjMi00MzI0LWIyNTItOTZiNGNhOWUwYTI4YJzGhgZ555Ei_Iv_SWlhHlzaRMQ1; SearchResultOrderBy=PriceDesc; DealerLeadsPreTestKey=True; at_uid=mfAcGz813UijYm%2f9Gc2qqw%3d%3d; InternalSignInComplete=False; InternalSignInCompleteNew=False; cc_audpid=430e20732b28f2c7ba2d5be3182cf0ec; {E7ABF06F-D6A6-4c25-9558-3932D3B8A04D}=optimizelyEndUserId=oeu1546288405916r0.23219734574282036&pCode=L7R+0B4&PageSize=15&AAMC_traderca_0=REGION%257C7&cc_audpid=430e20732b28f2c7ba2d5be3182cf0ec&AMCVS_2650037254CC132F0A4C98A6%40AdobeOrg=1&culture=en-ca&uag=69962FE6D5D8F6D8A13AA09DEAA150E0AF8ACC624C0681A20A7E9500C633BA4F&SortOrder=PriceDesc&AMCV_2650037254CC132F0A4C98A6%40AdobeOrg=1099438348%257CMCIDTS%257C17902%257CMCMID%257C61762905408050898680174600361146729466%257CMCAAMLH-1546893208%257C7%257CMCAAMB-1547408986%257CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%257CMCOPTOUT-1546811386s%257CNONE%257CvVersion%257C2.1.0&srchLocation=%257B%2522Location%2522%3a%257B%2522Address%2522%3anull%2c%2522City%2522%3a%2522Burlington%2522%2c%2522Latitude%2522%3a43.38621%2c%2522Longitude%2522%3a-79.83713%2c%2522Province%2522%3a%2522ON%2522%2c%2522PostalCode%2522%3anull%2c%2522Type%2522%3a%2522%2522%257D%2c%2522UnparsedAddress%2522%3a%2522Burlington%2c%2520ON%2522%257D&searchState=%7b%22isUniqueSearch%22%3afalse%2c%22make%22%3a%22Honda%22%2c%22model%22%3a%22Civic%22%7d&lastsrpurl=%2fcars%2fhonda%2fcivic%2fon%2fburlington%2f&gtm_inmarket_search=true; __utmz=1.1547176815.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); SortOrder=PriceDesc; nlbi_820541_1646237=PMSfBHhxEHYkcGWQCOa5EgAAAAA4tf9PayhQkLrUhubKcWP7; AMCVS_2650037254CC132F0A4C98A6%40AdobeOrg=1; ASP.NET_SessionId=bcm0ercgtdwelfb0dgdj21vk; culture=en-ca; nlbi_820541_1646235=9pBCfqSeBlaU/rZuCOa5EgAAAACsm/k0oZhXrHPgQUgIdx2f; __utmc=1; 359_MVT=Production; incap_ses_677_820541=TuyqcgMGP0m7jlSD9TBlCfUdSlwAAAAAxKFz0lmbGa9NKShPYAlkCQ==; incap_ses_1002_820541=3YXVWNg2qgVt4jzJPNLnDaBFS1wAAAAAqTH+dOhbKju7PE7ya/m6JA==; incap_ses_530_820541=g8iydJkbsko61boENvFaB4ybTFwAAAAAAUoqmF8+aQR3K0DMuEkVxw==; _fbp=fb.1.1548524429333.920537732; _gid=GA1.2.216334428.1548524430; AMCV_2650037254CC132F0A4C98A6%40AdobeOrg=1099438348%7CMCIDTS%7C17922%7CMCMID%7C61762905408050898680174600361146729466%7CMCAAMLH-1549129230%7C7%7CMCAAMB-1549129230%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1548531630s%7CNONE%7CvVersion%7C2.1.0; srchLocation=%7B%22Location%22:%7B%22Address%22:null,%22City%22:%22Burlington%22,%22Latitude%22:43.3377685546875,%22Longitude%22:-79.80254364013672,%22Province%22:%22ON%22,%22PostalCode%22:%22L7R%200B4%22,%22Type%22:%22%22%7D,%22UnparsedAddress%22:%22L7R0B4%22%7D; lastsrpurl=/cars/honda/accord/on/burlington/?rcp=15&rcs=0&srt=3&trim=EX-L%2CEX-L%20w-Navi%2CSport%2CTouring&yRng=2014%2C&pRng=%2C17500&prx=25&prv=Ontario&loc=L7R0B4&trans=Automatic&hprc=True&wcp=True&sts=New-Used&nod=4%2B%20Door&inMarket=advancedSearch; searchState={"isUniqueSearch":false,"make":"Honda","model":"Accord"}; uag=DD83109745972DB18984A1EEEA659BA45124E731A1CA35CF766CEB2C78CDA978; PreviouslyViewedPVs=5-42423732%2c5-42065851%2c5-42244841%2c5-42278097%2c5-41992218%2c19-10936588%2c5-42160387%2c5-42241651%2c5-41965246%2c5-42082483%2c5-37819580%2c5-41702376%2c5-42242890%2c5-41373481%2c5-41778296%2c5-41482594%2c19-10932841%2c19-10932842%2c5-42444524%2c5-42424838%2c5-42293767%2c19-10923281%2c19-10930137%2c5-41790203%2c5-41136493%2c19-10902368%2c19-10918059%2c5-42386690%2c5-42192341%2c5-41718075; searchFlag=true; __utma=1.125112106.1546288407.1548216821.1548525035.9; __utmt=1; __utmb=1.4.9.1548525052914; _4c_=jVNdb9sgFP0rFa9LY8CA7bx1nbRWWtuHds8RH9eJFddYQJZlUf77LmnSLq20DckycM79PuzIZgkDmTEpasklYyVrxISsYBvJbEdC5%2FLvB5kRJhpFTQVClFWtJLeVsla2LeXQmLY1ZEJ%2BZj9SUCXwU6zaT0iwJ%2FsU1vCOU9GGZk46clrdR%2FhAkUixg%2B3%2FSVqE9uQpl0MF5VwKpj6QFZLdcErMQavXfTrLTXLRlAxp3SvrHY6Oa8RbE06M7RujkRzDsypX9%2FzWQqCSNwqXYU4ooZgpuZGWKUMZdUa9OpA1U5yqUuUU7Hi035F1wC6QZUpjnBXFZrOZ6nXyKWgHYWp1oYsbPzhdXFnrg%2Fv0CE4PxWe06oZF8kPxMCQdOl%2FIueCCl1XJ5w%2F3nNKacZZnrxhORBWYh%2FUO8tCbaTWleE6%2F8HRZ0ryHIRczhtySMXi3tmmetmPmb8BcRLdCoIvf%2FGIB7hb7TO7x4kbHa4%2FxbQL3CH0P4QiY4DfxcLpeBv8MFxXDW4%2F6I3fa4jZACyEcGHiKXcqRzgo%2FXqNqz5DLAzLmbEvc9N7qPtui4LPXsdfbeZ7O%2FwwmQoydHw50Km3VWqksVwJkpcHIxijRGFM1lpn8FL5ezb%2FffskdzI%2BKM6qmqEnF6xoV%2BIL%2FDX4KHTYv3EFa%2BhzxCevpEsbXPXnR7h%2Bydah6YnuNCVoHcZX8SPZHKYlaMSZkSfOrTgnlUytB89rv978B'
}

resp = requests.get(
    "https://www.autotrader.ca/",
    # "https://www.autotrader.ca/a/Honda/Accord+Sedan/Burlington/Ontario/5_42423732_ON20081215113610906/",
    headers=headers)

print(resp.status_code)

If you need the path and thedomain for each cookie, which get_dict() is not exposes, you can parse the cookies manually, for instance:

[
    {'name': c.name, 'value': c.value, 'domain': c.domain, 'path': c.path}
    for c in session.cookies
]

You can use cookies.get_dict() to get the cookies using requests. If you need the Set-Cookie response sent by the server, it will be present in the response headers.

import requests
s = requests.Session()
r=s.get('http://www.google.com')
sep="\n------------------\n"
print(r.headers,end=sep)
print(r.headers['Set-Cookie'],end=sep)
print(r.cookies.get_dict())

Output

{'Date': 'Sat, 26 Jan 2019 20:59:50 GMT', 'Expires': '-1', 'Cache-Control': 'private, max-age=0', 'Content-Type': 'text/html; charset=ISO-8859-1', 'P3P': 'CP="This is not a P3P policy! See g.co/p3phelp for more info."', 'Content-Encoding': 'gzip', 'Server': 'gws', 'Content-Length': '5360', 'X-XSS-Protection': '1; mode=block', 'X-Frame-Options': 'SAMEORIGIN', 'Set-Cookie': '1P_JAR=2019-01-26-20; expires=Mon, 25-Feb-2019 20:59:50 GMT; path=/; domain=.google.com, NID=156=DqD5DO6OULcovwiJYJF3fFCU6FDUPP9xqCjdIzMVA48TXdk46ZMV-MeJl5Eg_4chXeZHAtKT-WiIEAiRFXSH8SF_riyegpizTr1xQFegMu2dF7rFpCuWnL8IlBhEtp6BYwUHYifWxUzBIQjAnKVbz1_am1j2vW90QsRkNpiDqvw; expires=Sun, 28-Jul-2019 20:59:50 GMT; path=/; domain=.google.com; HttpOnly'}
------------------
1P_JAR=2019-01-26-20; expires=Mon, 25-Feb-2019 20:59:50 GMT; path=/; domain=.google.com, NID=156=DqD5DO6OULcovwiJYJF3fFCU6FDUPP9xqCjdIzMVA48TXdk46ZMV-MeJl5Eg_4chXeZHAtKT-WiIEAiRFXSH8SF_riyegpizTr1xQFegMu2dF7rFpCuWnL8IlBhEtp6BYwUHYifWxUzBIQjAnKVbz1_am1j2vW90QsRkNpiDqvw; expires=Sun, 28-Jul-2019 20:59:50 GMT; path=/; domain=.google.com; HttpOnly
------------------
{'1P_JAR': '2019-01-26-20', 'NID': '156=DqD5DO6OULcovwiJYJF3fFCU6FDUPP9xqCjdIzMVA48TXdk46ZMV-MeJl5Eg_4chXeZHAtKT-WiIEAiRFXSH8SF_riyegpizTr1xQFegMu2dF7rFpCuWnL8IlBhEtp6BYwUHYifWxUzBIQjAnKVbz1_am1j2vW90QsRkNpiDqvw'}

You can also take a look at Requests Session Objects which allows you to persist certain parameters across requests.

import requests
s = requests.Session()
r=s.get("https://www.autotrader.ca/")
print(s.cookies.get_dict())

Output

{'359_MVT': 'Beta', 'incap_ses_427_820268': 'CH+XZmtI+XAoSGd5jgPtBYbSTFwAAAAAisPp/ga12qcaus8OQBy+WQ==', 'incap_ses_427_820541': '5w7HIplKciGGR2d5jgPtBYXSTFwAAAAAtK6JqBFZ7yOfMEQRTxsb4w==', 'nlbi_820541_1646237': '6PXif98z32ITUgWNCOa5EgAAAAB0qBmvDiWBSIKScEbsrrei', 'visid_incap_820268': '8Y9/QUrMSN6ig2Eh8yaQBobSTFwAAAAAQUIPAAAAAAC3xP7V2sSXvYIv1o3+boYi', 'visid_incap_820541': 'GTvhgGUCSPiBzrX555BMD4XSTFwAAAAAQUIPAAAAAADrOrmirYySt7jxsjvAx4e6', '___utmvavlufwBX': 'FRU\x01rTwc', '___utmvbvlufwBX': 'JZt\r\n    XUhORalX: Ltz', '___utmvmvlufwBX': 'ISmGYgcVGsA'}

That being said, i don't think Requests is the tool for the job here. Selenium can be used to scrap these kinds of websites.

Eg. Printing the headline from your commented url

from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get('https://www.autotrader.ca/a/Honda/Accord+Sedan/Burlington/Ontario/5_42423732_ON20081215113610906/')
title = driver.find_element_by_css_selector('h1').text
print(title)  

Output

2014 Honda Accord EX-L|SERVICE HISTORY ON FILE - Burlington

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM