简体   繁体   English

从python网站下载Excel报告会保存一个空白文件

[英]Downloading an excel report from website in python saves a blank file

I have about 8 reports that I need to pull from a system every week which takes quite a bit of time so I am working on automating this process. 我每周大约有8份报告需要从系统中提取,这需要花费大量时间,因此我正在努力实现此过程的自动化。 I am using requests to login to the site and download the files. 我正在使用请求登录到网站并下载文件。 However, when I download the file using my python script the file comes back blank. 但是,当我使用python脚本下载文件时,该文件变回空白。 When I use the same link to download from the browser its not blank. 当我使用相同的链接从浏览器下载时,它不是空白。 Below is my code: 下面是我的代码:

payload = {
    'txtUsername': 'uid',
    'txtPassword': 'pass'
}

domain = 'https://example.com/login.aspx?ReturnUrl=%2fiweb%2f'
path = 'C:\\Users\\workspace\\data-in\\'

with requests.Session() as s:
    p = s.post(domain, data=payload)
    r = s.get('https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557')
    with open(path + 'report1.xls', 'wb') as f:
        f.write(r.content)

A little about the url. 有关网址的一些知识。 When I was looking for the url I found that it's wrapped in some JS. 当我寻找URL时,我发现它包装在一些JS中。

<a href="javascript:void(0);OpenNewWindow('../forms/MSWordFromSql.aspx?ContentType=excel&amp;object=Organization&amp;FormKey=f326228c-3c49-4531-b80d-d59600485557',true);" id="ListToolbarRAWEXCELExportLink" class="TopUIRawExcelExportMenuLink">Export Raw Data to Excel</a>

However, when I take a look at the path from which the files was downloaded the true location for the report is this: 但是,当我查看下载文件的路径时,报告的真实位置是这样的:

https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557

This is the URL I am using in my code to download a report. 这是我在代码中用于下载报告的URL。 After I run the script the file is created, named and saved to the correct directory but its empty. 运行脚本后,将创建文件,命名文件并将其保存到正确的目录,但该文件为空。 As I mentioned at the top of the thread, if I simply copy the URL about to the browser it downloads the report with no problem. 正如我在线程顶部提到的那样,如果我只是将URL复制到浏览器中,那么它将毫无问题地下载报告。

I was also thinking about using Selenium to get this done but the issue is I cannot rename the files while they are being downloaded. 我也在考虑使用Selenium完成此操作,但问题是在下载文件时我无法重命名文件。 I need each file to have a specific name because all of the downloaded reports are then used in another automation script. 我需要每个文件都有一个特定的名称,因为所有下载的报告都将在另一个自动化脚本中使用。

As @Lucas mentioned, your Python code likely sends a different request than your browser does, and thus receives a different response. 如@Lucas所述,您的Python代码可能发送的请求与浏览器发送的请求不同,从而收到不同的响应。

I'd use the browser dev tools to inspect the request the browser makes to initiate the download. 我将使用浏览器开发工具来检查浏览器发出的启动下载的请求。 Use "Copy as curl" and try to reproduce the correct behavior from the command line. 使用“复制为卷曲”并尝试从命令行重现正确的行为。

Then reduce the differences between the curl request and the one your python code makes by removing unnecessary parts from the curl invocations and adding the necessary headers to your python code. 然后,通过从curl调用中删除不必要的部分并将必要的标头添加到python代码中,从而减少curl请求与python代码所产生的请求之间的差异。 https://curl.trillworks.com/ can help with the latter. https://curl.trillworks.com/可以为后者提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM