简体   繁体   English

Python - 使用请求 package 下载 zip 文件,但得到未知文件格式

[英]Python - Download zip files with requests package but get unknown file format

I am using Python 3.8.12.我正在使用 Python 3.8.12。 I tried the following code to download files from URLs with the requests package, but got 'Unkown file format' message when opening the zip file.我尝试使用以下代码通过requests package 从 URL 下载文件,但在打开 zip 文件时收到“未知文件格式”消息。 I tested on different zip URLs but the size of all zip files are 18KB and none of the files can be opened successfully.我测试了不同的 zip 网址,但所有 zip 文件的大小都是 18KB,没有一个文件可以成功打开。

import requests

file_url = 'https://www.censtatd.gov.
hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
file_download = requests.get(file_url, allow_redirects=True, stream=True)
open(save_path+file_name, 'wb').write(file_download.content)

Zip file opening error message Zip 文件打开错误信息

Zip files size Zip 文件大小

However, once I updated the url as file_url = 'https://www.td.gov.hk/datagovhk_tis/mttd-csv/en/table41a_eng.csv' the code worked well and the csv file could be downloaded perfectly.但是,一旦我将 url 更新为file_url = 'https://www.td.gov.hk/datagovhk_tis/mttd-csv/en/table41a_eng.csv' ,代码运行良好,可以完美下载 csv 文件。

I try to use requests , urllib , wget and zipfile io packages, but none of them work.我尝试使用requestsurllibwgetzipfile io包,但它们都不起作用。

The reason may be that the zip URL directs to both the zip file and a web page, while the csv URL directs to the csv file only.原因可能是 zip URL 同时指向 zip 文件和 web 页面,而 csv URL 仅指向 csv 文件。

I am really new to this field, could anyone help on it?我真的是这个领域的新手,有人可以帮忙吗? Thanks a lot!非常感谢!

import wget

url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html? 
pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
#url = 'https://golang.org/dl/go1.17.3.windows-amd64.zip'
wget.download(url)

You might examine headers after sending HEAD request to get information regarding file, examining Content-Type allows you to reveal actual type of file您可以在发送HEAD请求后检查标头以获取有关文件的信息,检查Content-Type可以让您揭示文件的实际类型

import requests
file_url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
r = requests.head(file_url)
print(r.headers["Content-Type"])

gives output给出 output

text/html

So file you have URL to is actually HTML page.所以你有 URL 的文件实际上是 HTML 页。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM