简体   繁体   English

Python麻烦下载带有无扩展名的动态Url的excel文件

[英]Python Trouble Downloading excel file with Dynamic Url with No extension

I could'nt find a way to download an excel file using requests module in python the url seems to be dynamic and does'nt have any extensions present i dropping the code below我找不到使用 python 中的请求模块下载 excel 文件的方法 url 似乎是动态的,并且没有任何扩展,我删除下面的代码

download = requests.get('https://www.djppr.kemenkeu.go.id/page/loadViewer?idViewer=9369&action=download')
with open('file.xlsx', 'wb') as f:
f.write(download.content)

the output of this code saves only the html code in to the file.此代码的输出仅将 html 代码保存到文件中。 Can anyone help me find a proper way to download the excel sheet谁能帮我找到下载excel表的正确方法

First you should check what you get in download.content - maybe it sends HTML with some message, or ask for login and password, or have some JavaScript which redirect to file.首先,您应该检查您在download.content获得的download.content - 可能它会发送带有一些消息的 HTML,或者要求登录名和密码,或者有一些重定向到文件的 JavaScript。

You should also check how your url behaves in browser.您还应该检查您的网址在浏览器中的表现。 This way you can also see if it display some HTML.通过这种方式,您还可以查看它是否显示一些 HTML。

You could even download file in browser and then you can get from browser real URL for file.您甚至可以在浏览器中下载文件,然后您可以从浏览器中获取文件的真实 URL。 And you can see if you can find this url in HTML or if you can find some elements which you could use to generate URL.您可以查看是否可以在 HTML 中找到此 url,或者是否可以找到一些可用于生成 URL 的元素。


Your url gives HTML page which has <iframe> with scr which has relative URL to file.您的 url 提供具有<iframe>scr HTML 页面,其中scr具有到文件的相对 URL。 So you have to first get HTML, next search <iframe> and get relative src , next create absolute URL, and next download file.因此,您必须首先获取 HTML,然后搜索<iframe>并获取相对src ,然后创建绝对 URL,然后下载文件。

import requests
from bs4 import BeautifulSoup

url = 'https://www.djppr.kemenkeu.go.id/page/loadViewer?idViewer=9369&action=download'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

src = soup.find('iframe')['src']
print(src)

url = 'https://www.djppr.kemenkeu.go.id' + src

r = requests.get(url)
with open('file.xlsx', 'wb') as f:
    f.write(r.content)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM