简体   繁体   中英

Downloading an excel report from website in python saves a blank file

I have about 8 reports that I need to pull from a system every week which takes quite a bit of time so I am working on automating this process. I am using requests to login to the site and download the files. However, when I download the file using my python script the file comes back blank. When I use the same link to download from the browser its not blank. Below is my code:

payload = {
    'txtUsername': 'uid',
    'txtPassword': 'pass'
}

domain = 'https://example.com/login.aspx?ReturnUrl=%2fiweb%2f'
path = 'C:\\Users\\workspace\\data-in\\'

with requests.Session() as s:
    p = s.post(domain, data=payload)
    r = s.get('https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557')
    with open(path + 'report1.xls', 'wb') as f:
        f.write(r.content)

A little about the url. When I was looking for the url I found that it's wrapped in some JS.

<a href="javascript:void(0);OpenNewWindow('../forms/MSWordFromSql.aspx?ContentType=excel&amp;object=Organization&amp;FormKey=f326228c-3c49-4531-b80d-d59600485557',true);" id="ListToolbarRAWEXCELExportLink" class="TopUIRawExcelExportMenuLink">Export Raw Data to Excel</a>

However, when I take a look at the path from which the files was downloaded the true location for the report is this:

https://example.com/forms/MSWordFromSql.aspx?ContentType=excel&object=Organization&FormKey=f326228c-3c49-4531-b80d-d59600485557

This is the URL I am using in my code to download a report. After I run the script the file is created, named and saved to the correct directory but its empty. As I mentioned at the top of the thread, if I simply copy the URL about to the browser it downloads the report with no problem.

I was also thinking about using Selenium to get this done but the issue is I cannot rename the files while they are being downloaded. I need each file to have a specific name because all of the downloaded reports are then used in another automation script.

As @Lucas mentioned, your Python code likely sends a different request than your browser does, and thus receives a different response.

I'd use the browser dev tools to inspect the request the browser makes to initiate the download. Use "Copy as curl" and try to reproduce the correct behavior from the command line.

Then reduce the differences between the curl request and the one your python code makes by removing unnecessary parts from the curl invocations and adding the necessary headers to your python code. https://curl.trillworks.com/ can help with the latter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM