简体   繁体   English

如何从受密码保护的网站下载数据

[英]How to download data from a password protected website

I'm using request in python to try and download this file: http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N55W003.SRTMGL1.hgt.zip there are 14000 such files hence why I need to automate the process.我在 python 中使用请求尝试下载此文件: http ://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N55W003.SRTMGL1.hgt.zip 有 14000 个这样的文件因此为什么我需要自动化这个过程。 The other techniques I've found online don't seem to work.我在网上找到的其他技术似乎不起作用。 I assume due the websites they are designed for using a different authentication method.我假设由于网站设计为使用不同的身份验证方法。 I don't know much about web development so I can't work out how this authentication works.我对网络开发了解不多,所以我不知道这种身份验证是如何工作的。

Edit编辑

This is the code:这是代码:

import json
import requests
from requests.auth import HTTPBasicAuth


file = open("srtm30m_bounding_boxes.json", 'r')
strjson = file.read()
x = json.loads(strjson)

filenamelist = []

url = "http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N55W003.SRTMGL1.hgt.zip"

for i in range(14295):
    filenamelist.append(x['features'][i]['properties']['dataFile'])
    filenamelist[i] = "http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/" + filenamelist[i]

jar = requests.cookies.RequestsCookieJar()
jar.set('urs_user_already_logged', 'yes')
jar.set('_urs-gui_session','8b972449036e60e3d83a6a819b93124d')
r = requests.get(url, cookies=jar)

And this is the error I get when I run the code:这是我运行代码时遇到的错误:

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

The simplest thing is to provide the username and password in the URL before the host, eg:最简单的就是在主机前的 URL 中提供用户名和密码,例如:

requests.get('http://{username}:{password}@e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/{subpath}'.format(username=username, password=password, subpath=filenamelist[i]))

You can also supply the username/password as the auth parameter to get:您还可以提供用户名/密码作为auth参数以获取:

requests.get('http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/{subpath}'.format(subpath=filenamelist[i]), auth=(username, password))

totalhack is right that https is more secure, and it seems to work on this site. totalhack 是正确的,https 更安全,它似乎可以在这个站点上工作。 This form of authentication transmits the username and password as plaintext, so anyone who can observe the http request would also be able to steal your login.这种身份验证形式以明文形式传输用户名和密码,因此任何可以观察到 http 请求的人也可以窃取您的登录信息。 https encrypts the username / password since it encrypts the entire request. https 加密用户名/密码,因为它加密了整个请求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM