[英]Python script with BeautifulSoup downloading empty zip files
I am trying to collectively retrieve hydrological measurement data using Python. Unfortunately, I get empty zip files downloaded every time.我正在尝试使用 Python 集体检索水文测量数据。不幸的是,我每次下载的 zip 文件都是空的。
here goes my code, done based on some YT tutorials:这是我的代码,基于一些 YT 教程完成:
from bs4 import BeautifulSoup
import requests
domain = "https://danepubliczne.imgw.pl/"
URL = 'https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_hydrologiczne/dobowe/2021/'
filetype = '.zip'
def get_soup(url):
return BeautifulSoup(requests.get(url).text, 'html.parser')
for link in get_soup(URL).find_all('a'):
zip_link = link.get('href')
if filetype in zip_link:
print(zip_link)
with open(link.text, 'wb') as file:
response = requests.get(domain + zip_link)
file.write(response.content)
It looks like the issue is here:看起来问题出在这里:
response = requests.get(domain + zip_link)
The domain variable is the base url of the web site, but the links are relative to the sub directory.域变量是web站点的base url,但是链接是相对于子目录的。 If you add
如果你添加
print(domain + zip_link)
You can see that you get something like https://danepubliczne.imgw.pl/codz_2021_01.zip你可以看到你得到了类似https://danepubliczne.imgw.pl/codz_2021_01.zip 的东西
It looks like what you want is response = requests.get(URL + zip_link)
看起来你想要的是
response = requests.get(URL + zip_link)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.