简体   繁体   中英

how to download this zip file using python requests?

I am trying to download a zip file that is stored here:

http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N45W074.SRTMGL1.hgt.zip

If you paste this into the browser and hit enter, it will download the.zip folder.

If you inspect the browser while this is happening, you will see that there is an internal redirect going on:

1个

2个

And eventually the zip gets downloaded.

I am trying to automate this downloading using the python requests library by doing the following:

import requests
requests.get(url, 
             allow_redirects=True, 
             headers={'User-Agent':'Chrome/107.0.0.0'})

I've tried tons of combinations, using the full header string from the HTML inspection, forcing verify=True, with and without redirects, adding a HTTPBasicAuth user/pass that says is required although the file seems to download fine without any credentials.

Honestly no clue what I'm missing, this is not my expertise. I keep getting this error:

>>> requests.get(url, 
...              allow_redirects=True, 
...              headers={'User-Agent':'Chrome/107.0.0.0'})
Traceback (most recent call last):
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
    raise err
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine 
actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\http\client.py", line 1282, in request        
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\http\client.py", line 1328, in _send_request  
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\http\client.py", line 1277, in endheaders     
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\http\client.py", line 1037, in _send_output   
    self.send(msg)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\http\client.py", line 975, in send
    self.connect()
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\connection.py", line 205, in connect
    conn = self._new_conn()
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x0000016B76A6FE50>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\requests\adapters.py", line 489, in send
    resp = conn.urlopen(
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\urllib3\util\retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='e4ftl01.cr.usgs.gov', port=80): Max retries exceeded with url: /MEASURES/SRTMGL1.003/2000.02.11/N45W074.SRTMGL1.hgt.zip (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000016B76A6FE50>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\requests\sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\requests\sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\dere\Miniconda3\envs\blender\lib\site-packages\requests\adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='e4ftl01.cr.usgs.gov', port=80): Max retries exceeded with url: /MEASURES/SRTMGL1.003/2000.02.11/N45W074.SRTMGL1.hgt.zip (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000016B76A6FE50>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

Can someone help me arrive at the code that will result in a successful request? I know how to write the request into a zip afterwards..

Change http to Https: This should work

import requests

# download zip file from url
url = "https://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N45W074.SRTMGL1.hgt.zip"
r = requests.get(url)
with open("N45W074.SRTMGL1.hgt.zip", "wb") as f:
    f.write(r.content)

Thanks to @cnemri for the tip.

It was a combination of changing http to https and also including this specific cookie header in the request header:

headers = {'Cookie': '_gid=GA1.2.48775707.1669266346; _hjSessionUser_606685=eyJpZCI6IjQ1MzEzM2QzLTI3MWEtNWM0YS04M2YzLWRmMmMzNDk4NjY1ZSIsImNyZWF0ZWQiOjE2NjkyNjYzNDU2MjUsImV4aXN0aW5nIjp0cnVlfQ==; ERS_production_2=b7dfade669180a6d6d250b030ffc3cf2UZZa8%2F471MfcV%2FaNuFtu6Pli%2BpP8jKOIwR4JvQjm%2B6DLPjl679vVf5SCDk7C5TLQsj7qckIev6lmtGb6Mes5RKDHUs%2BBp3EAjKW2%2BMCUpV%2Fnx0z1pdaCQQ%3D%3D; EROS_SSO_production_secure=eyJjcmVhdGVkIjoxNjY5MjY5MjEzLCJ1cGRhdGVkIjoiMjAyMi0xMS0yMyAyMzo1MzoyOCIsImF1dGhUeXBlIjoiRVJTIiwiYXV0aFNlcnZpY2UiOiJFUk9TIiwidmVyc2lvbiI6MS4xLCJzdGF0ZSI6ImVjMDY3YmMzYmRhMzBiZTUyNTkxYTNiZTYwMTMwZWNmNjAwMWU1Y2JlMGMxZmNkYTU4Y2Y4OTY0YjRlNTJkOTEiLCJpZCI6InY5U1Q4YVl3M1hRKCsyIiwic2VjcmV0IjoiPl8lMCZPYiY9MUVkclRLZTt4fHVZLVcyU1VvTiA1SE17KiQqaG12OSxvPl5%2BdyJ9; _ga_0YWDZEJ295=GS1.1.1669269458.1.0.1669269460.0.0.0; _ga=GA1.2.1055277358.1669266346; _ga_71JPYV1CCS=GS1.1.1669306300.2.1.1669306402.0.0.0; DATA=Y3_gg9uD6LmunsfDHneR9wAAARQ'}

then, this worked:

requests.get(url, headers=headers)

I'm not sure if this is the "solution" or just a workaround I've discovered. Still appreciate any input. I think it may just be credentials that are stored?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM