简体   繁体   English

使用 python + 请求从链接下载多个 zip 文件

[英]Using python + requests to download multiple zip files from links

I am trying to download zipped files from the US census bureau ( https://www2.census.gov/geo/tiger/TIGER2019/PLACE/ ).我正在尝试从美国人口普查局 ( https://www2.census.gov/geo/tiger/TIGER2019/PLACE/ ) 下载压缩文件。 The code I have so far appeared to work, but all of the files downloaded are empty.到目前为止,我的代码似乎可以工作,但下载的所有文件都是空的。 Can someone help fill in what i'm missing?有人可以帮助填写我所缺少的吗? 在此处输入图像描述

from bs4 import BeautifulSoup as bs
import requests
import re

DOMAIN = "https://www2.census.gov/"
URL = "https://www2.census.gov/geo/tiger/TIGER2019/PLACE/"


def get_soup(URL):
    return bs(requests.get(URL).text, 'html.parser')

for link in get_soup(URL).findAll("a", attrs={'href': re.compile(".zip")}):
    file_link = link.get('href')
    print(file_link)

with open(link.text, 'wb') as file:
    response = requests.get(DOMAIN + file_link)
    file.write(response.content)

It looks like you're using the wrong links.看起来您使用了错误的链接。

After looking at the website I can see that the download links have the following URL: "https://www2.census.gov/geo/tiger/TIGER2019/PLACE/[file_link]",查看网站后,我可以看到下载链接有以下URL:“https://www2.census.gov/geo/tiger/TIGER2019/PLACE/[file_link]”,

So currently your DOMAIN variable is wrong it should instead be using your URL variable.所以目前你的 DOMAIN 变量是错误的,它应该使用你的 URL 变量。

Currently, the "with open" also isn't indented to work with all the links in the for-loop so only the last link is downloaded in the current state your code.目前,“with open”也没有缩进与 for 循环中的所有链接一起使用,因此只有最后一个链接下载到当前 state 您的代码中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM