简体   繁体   English

Python请求模块未在会话中传递参数

[英]Python requests module not passing params in session

I am using am attempting to do a bulk download of a series of PDFs from a site that requires login authentication.我正在尝试从需要登录身份验证的站点批量下载一系列 PDF。 I am able to successfully log in, however, when I attempt a GET request for '/transcripts/transcript.pdf?user_id=3007' but, the request returns the content for '/transcripts/transcript.pdf' .但是,当我尝试对'/transcripts/transcript.pdf?user_id=3007'发出 GET 请求时,我能够成功登录,但是该请求返回了'/transcripts/transcript.pdf'的内容。

Does anyone have any idea why the URL param is not sending?有谁知道为什么 URL 参数没有发送? Or why it would be rerouted?或者为什么它会被重新路由?

I have tried passing the parameter 'user_id' as data, params, and hardcoded in the URL.我尝试将参数“user_id”作为数据、参数和硬编码在 URL 中传递。

I have removed the actual domain from the strings below just for privacy出于隐私考虑,我已从以下字符串中删除了实际域

with requests.Session() as s:
    login = s.get('<domain>/login/canvas')
    # print the html returned or something more intelligent to see if it's a successful login page.
    print(login.text)
    login_html = lxml.html.fromstring(login.text)
    hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
    form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
    print("form: ",form)
    form['pseudonym_session[unique_id]']= username 
    form['pseudonym_session[password]']= password
    response = s.post('<domain>/login/canvas',data=form)
    print(response.url, response.status_code) # gets <domain>?login_success=1 200


    # An authorised request.
    data = { 'user_id':'3007'}
    r = s.get('<domain>/transcripts/transcript.pdf?user_id=3007', data=data)
    print(r.url) # gets <domain>/transcripts/transcript.pdf
    print(r.status_code) # gets 200
    with open('test.pdf', 'wb') as f:
        f.write(r.content)

GET response returns /transcripts/transcript.pdf and not /transcripts/transcript.pdf?user_id=3007 GET 响应返回/transcripts/transcript.pdf而不是/transcripts/transcript.pdf?user_id=3007

From the looks of it, you are trying to use canvas.从它的外观来看,您正在尝试使用画布。 I'm pretty sure in canvas, you can bulk download all test attachments.我很确定在画布中,您可以批量下载所有测试附件。

If that's not the case, There are a few things to try:如果不是这种情况,可以尝试以下几点:

  1. after logging in, try typing the url with user_id into a browser.登录后,尝试在浏览器中输入带有 user_id 的 URL。 Does that take you directly to the PDF file or links to one?这是否会将您直接带到 PDF 文件或指向其中的链接?
  2. if so, look at the url, it may simply not display the parameters;如果是这样,查看 url,它可能根本不显示参数; some websites do this, don't worry about it有些网站会这样做,别担心

If not, GET may not be enough;如果没有,GET 可能还不够; perhaps the site uses javascript, etc.也许该网站使用了 javascript 等。

after looking through the '.history' of the request I found a series of 302 redirects.查看请求的“.history”后,我发现了一系列 302 重定向。
The first was to '/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf'第一个是'/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf'
In a desperate attempt, I tried:在绝望的尝试中,我尝试了:
s.get('/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf%3Fuser_id%3D3007') and this still rerouted me a few times but ultimately got me the file I wanted! s.get('/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf%3Fuser_id%3D3007')这仍然让我重新路由了几次,但最终得到了我想要的文件!

If anyone has a more elegant solution to this or any resources that I can read I would greatly appreciate it!如果有人对此有更优雅的解决方案或我可以阅读的任何资源,我将不胜感激!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM