[英]Python requests module not passing params in session
I am using am attempting to do a bulk download of a series of PDFs from a site that requires login authentication.我正在尝试从需要登录身份验证的站点批量下载一系列 PDF。 I am able to successfully log in, however, when I attempt a GET request for '/transcripts/transcript.pdf?user_id=3007' but, the request returns the content for '/transcripts/transcript.pdf' .
但是,当我尝试对'/transcripts/transcript.pdf?user_id=3007'发出 GET 请求时,我能够成功登录,但是该请求返回了'/transcripts/transcript.pdf'的内容。
Does anyone have any idea why the URL param is not sending?有谁知道为什么 URL 参数没有发送? Or why it would be rerouted?
或者为什么它会被重新路由?
I have tried passing the parameter 'user_id' as data, params, and hardcoded in the URL.我尝试将参数“user_id”作为数据、参数和硬编码在 URL 中传递。
with requests.Session() as s:
login = s.get('<domain>/login/canvas')
# print the html returned or something more intelligent to see if it's a successful login page.
print(login.text)
login_html = lxml.html.fromstring(login.text)
hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
print("form: ",form)
form['pseudonym_session[unique_id]']= username
form['pseudonym_session[password]']= password
response = s.post('<domain>/login/canvas',data=form)
print(response.url, response.status_code) # gets <domain>?login_success=1 200
# An authorised request.
data = { 'user_id':'3007'}
r = s.get('<domain>/transcripts/transcript.pdf?user_id=3007', data=data)
print(r.url) # gets <domain>/transcripts/transcript.pdf
print(r.status_code) # gets 200
with open('test.pdf', 'wb') as f:
f.write(r.content)
GET response returns /transcripts/transcript.pdf and not /transcripts/transcript.pdf?user_id=3007 GET 响应返回/transcripts/transcript.pdf而不是/transcripts/transcript.pdf?user_id=3007
From the looks of it, you are trying to use canvas.从它的外观来看,您正在尝试使用画布。 I'm pretty sure in canvas, you can bulk download all test attachments.
我很确定在画布中,您可以批量下载所有测试附件。
If that's not the case, There are a few things to try:如果不是这种情况,可以尝试以下几点:
If not, GET may not be enough;如果没有,GET 可能还不够; perhaps the site uses javascript, etc.
也许该网站使用了 javascript 等。
after looking through the '.history' of the request I found a series of 302 redirects.查看请求的“.history”后,我发现了一系列 302 重定向。
The first was to '/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf'第一个是'/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf'
In a desperate attempt, I tried:在绝望的尝试中,我尝试了:
s.get('/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf%3Fuser_id%3D3007') and this still rerouted me a few times but ultimately got me the file I wanted! s.get('/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf%3Fuser_id%3D3007')这仍然让我重新路由了几次,但最终得到了我想要的文件!
If anyone has a more elegant solution to this or any resources that I can read I would greatly appreciate it!如果有人对此有更优雅的解决方案或我可以阅读的任何资源,我将不胜感激!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.