[英]Getting http error 400 urllib2 when trying to download a file
Here is the thing, i'm doing a script that download files from different sites. 这是事情,我正在执行一个脚本,该脚本从不同的站点下载文件。 The thing is that i can't figure out why it throws me this error while if i put the same url on my browsers it let me download the files.
问题是我不知道为什么会引发此错误,而如果我在浏览器中输入相同的URL,则可以下载文件。 Also there are other urls that works fine.
另外,还有其他可以正常工作的网址。 So... here is the code:
所以...这是代码:
import os
from bs4 import BeautifulSoup
import time
import urllib2
f = urllib2.Request(url)
f.add_header('User-Agent', 'Mozilla/5.0 Windows NT 6.3; WOW64; rv:34.0')
request = urllib2.urlopen(f)
data = request.read()
soup = BeautifulSoup(data, 'html.parser')
p_name = soup.find('h2', id="searchResults").contents[0]
if not os.path.exists(p_name):
os.makedirs(p_name)
for a in soup.find_all('a', href="#register"):
f = a["data-durl"]
#Following two lines just prepares file name
n = len(f.split("/"))
n_file = f.split("/")[n-1]
path_file = p_name+"\\"+n_file
if os.path.isfile(path_file):
print "Firmware already downloaded. skipping it"
else:
print "Downloading "+ path_file
link = urllib2.urlopen(f)
datos = link.read()
#print "[+] Downloading firmware %s" % n_file
#n_archivo = "Archivo"+str(b)+".zip"
with open(path_file, "wb") as code:
code.write(datos)
time.sleep(2)
This url just wont work with this script : Non working url But this one works fine working url 这个网址就是不会用这个脚本工作: 非工作URL但这一个工作正常工作URL
Hope you can help me. 希望您能够帮助我。
EDIT: I added the libraries that i use for this. 编辑:我添加了我为此使用的库。 And the stack trace I found the error!!
和堆栈跟踪我发现了错误! Problem was spaces on the name of the file it tries to download.
问题是它尝试下载的文件名上有空格。 With f.replace(" ","%20") should work fine :)
使用f.replace(“”,“%20”)应该可以正常工作:)
You need to convert spaces in your filename to the URL encoding for a space: %20
. 您需要将文件名中的空格转换为以下空格的URL编码:
%20
。 To do this, you can add a line between these two lines using str.replace()
: 为此,您可以使用
str.replace()
在这两行之间添加一行:
print "Downloading "+ path_file
f = f.replace(' ', '%20')
link = urllib2.urlopen(f)
This will download from the url: 这将从URL下载:
http://www.downloads.netgear.com/files/GDC/ME101/ME101%20Software%20Utility%20Version%202.0.zip
instead of from 而不是来自
http://www.downloads.netgear.com/files/GDC/ME101/ME101 Software Utility Version 2.0.zip
which is invalid because it contains spaces. 无效,因为其中包含空格。
This URL still works in your browser because when you enter a URL with spaces, your browser will automatically convert them to %20
. 该URL在您的浏览器中仍然有效,因为当您输入带空格的URL时,浏览器会自动将它们转换为
%20
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.