[英]Python - Download zip files with requests package but get unknown file format
I am using Python 3.8.12.我正在使用 Python 3.8.12。 I tried the following code to download files from URLs with the
requests
package, but got 'Unkown file format' message when opening the zip file.我尝试使用以下代码通过
requests
package 从 URL 下载文件,但在打开 zip 文件时收到“未知文件格式”消息。 I tested on different zip URLs but the size of all zip files are 18KB and none of the files can be opened successfully.我测试了不同的 zip 网址,但所有 zip 文件的大小都是 18KB,没有一个文件可以成功打开。
import requests
file_url = 'https://www.censtatd.gov.
hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
file_download = requests.get(file_url, allow_redirects=True, stream=True)
open(save_path+file_name, 'wb').write(file_download.content)
Zip file opening error message Zip 文件打开错误信息
However, once I updated the url as file_url = 'https://www.td.gov.hk/datagovhk_tis/mttd-csv/en/table41a_eng.csv'
the code worked well and the csv file could be downloaded perfectly.但是,一旦我将 url 更新为
file_url = 'https://www.td.gov.hk/datagovhk_tis/mttd-csv/en/table41a_eng.csv'
,代码运行良好,可以完美下载 csv 文件。
I try to use requests
, urllib
, wget
and zipfile
io
packages, but none of them work.我尝试使用
requests
、 urllib
、 wget
和zipfile
io
包,但它们都不起作用。
The reason may be that the zip URL directs to both the zip file and a web page, while the csv URL directs to the csv file only.原因可能是 zip URL 同时指向 zip 文件和 web 页面,而 csv URL 仅指向 csv 文件。
I am really new to this field, could anyone help on it?我真的是这个领域的新手,有人可以帮忙吗? Thanks a lot!
非常感谢!
import wget
url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?
pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
#url = 'https://golang.org/dl/go1.17.3.windows-amd64.zip'
wget.download(url)
You might examine headers after sending HEAD
request to get information regarding file, examining Content-Type
allows you to reveal actual type of file您可以在发送
HEAD
请求后检查标头以获取有关文件的信息,检查Content-Type
可以让您揭示文件的实际类型
import requests
file_url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
r = requests.head(file_url)
print(r.headers["Content-Type"])
gives output给出 output
text/html
So file you have URL to is actually HTML page.所以你有 URL 的文件实际上是 HTML 页。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.