简体   繁体   中英

Downloading and exporting a zip file from url using BeautifulSoup

I have looked over the responses to previous zip downloading questions and I keep running into problems. I used BeatifulSoup to identify a particular zip file I want to download using the following code:

state_fips = '06'
county_fips = '037'
url = 'https://www2.census.gov/geo/tiger/TIGER2020/ROADS/'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')

# get state and county fips
st_cnty_string = f'tl_2020_{state_fips}{county_fips}'

I then try to read and write the data to a file but I keep getting errors or files that have 0 bytes. I am not sure where the problem/s is/are:

link = soup.findAll('a', attrs={'href': re.compile(st_cnty_string)})
data = urllib.request.urlretrieve(url, link.get('href'))
open('test.zip', 'wb').write(data)

I get the following error for this attempt:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: a bytes-like object is required, not 'tuple'

Any help would be much appreciated!

One problem is that BeautifulSoup returns relative links. But you need a complete url to download the zipfile.

Try this:

for link in soup.findAll('a', attrs={'href': re.compile(st_cnty_string)}):
    link_abs = f'{url}/{link.get("href")}'
    with open('test.zip', 'wb') as f:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM