简体   繁体   中英

Python Trouble Downloading excel file with Dynamic Url with No extension

I could'nt find a way to download an excel file using requests module in python the url seems to be dynamic and does'nt have any extensions present i dropping the code below

download = requests.get('https://www.djppr.kemenkeu.go.id/page/loadViewer?idViewer=9369&action=download')
with open('file.xlsx', 'wb') as f:
f.write(download.content)

the output of this code saves only the html code in to the file. Can anyone help me find a proper way to download the excel sheet

First you should check what you get in download.content - maybe it sends HTML with some message, or ask for login and password, or have some JavaScript which redirect to file.

You should also check how your url behaves in browser. This way you can also see if it display some HTML.

You could even download file in browser and then you can get from browser real URL for file. And you can see if you can find this url in HTML or if you can find some elements which you could use to generate URL.


Your url gives HTML page which has <iframe> with scr which has relative URL to file. So you have to first get HTML, next search <iframe> and get relative src , next create absolute URL, and next download file.

import requests
from bs4 import BeautifulSoup

url = 'https://www.djppr.kemenkeu.go.id/page/loadViewer?idViewer=9369&action=download'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

src = soup.find('iframe')['src']
print(src)

url = 'https://www.djppr.kemenkeu.go.id' + src

r = requests.get(url)
with open('file.xlsx', 'wb') as f:
    f.write(r.content)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM