简体   繁体   English

Python ValueError:未知的url类型:空格(?)

[英]Python ValueError: unknown url type: space (?)

I am using the urllib2 module in Python 2.7 using Spyder 3.0 to batch download text files by reading a text file that contains a list of them: 我正在Python 2.7中使用urllib2模块,并使用Spyder 3.0通过读取包含文本列表的文本文件来批量下载文本文件:

    reload(sys)
    sys.setdefaultencoding('utf-8')
    with open('ocean_not_templated_url.txt', 'r') as text:
        lines = text.readlines()
        for line in lines:
            url = urllib2.urlopen(line.strip('ï \xa0\t\n\r\v'))
            with open(line.strip('\n\r\t ').replace('/', '!').replace(':', '~'), 'wb') as out:
                for d in url:
                    out.write(d)

I've already discovered a bunch of weird characters in the urls that I've since stripped, however, the script fails when nearly 90% complete, giving the following error: 我已经在剥离后的网址中发现了一堆奇怪的字符,但是,脚本在完成将近90%时失败,并给出以下错误:

在此处输入图片说明

I thought it to be a non-breaking space (denoted by \\xa0 in the code), but it still fails. 我认为这是一个不间断的空格(在代码中用\\ xa0表示),但仍然失败。 Any ideas? 有任何想法吗?

That's an odd URL! 这是一个奇怪的网址!

Specify the communication protocol over the network. 指定网络上的通信协议。 Try prefixing the URL with http:// and the domain names if the file exists on the WWW. 如果该文件存在于WWW上,则尝试给URL加上http://和域名作为前缀。

Files always reside somewhere, in some server's directory, or locally on your system. 文件始终位于某个服务器目录中的某个位置或系统中的本地位置。 So there must be a network path to such files, for example: 因此,必须有指向此类文件的网络路径,例如:

http://127.0.0.1/folder1/samuel/file1.txt

Same example, with localhost being an alias for 127.0.0.1 (generally) 同一个示例,其中localhost127.0.0.1的别名(通常)

http://localhost/folder1/samuel/file1.txt

That might solve the problem. 那可能会解决问题。 Just think about where your file exists and how it should be addressed... 只要考虑一下您文件的位置以及应如何解决...


Update: 更新:

I experimented quite a bit on this. 我对此做了很多实验。 I think I know why that error is raised! 我想我知道为什么会出现该错误! :D :D

I speculate that your file which stores the URL's actually has a sneaky empty line near the end. 我推测您存储URL的文件实际上在末尾有一个偷偷摸摸的 空行 I can say it's near the end as you said that it executes about 90% of it and then fails. 我可以说它接近尾声,正如您所说的那样,它执行了大约90%的操作,然后失败了。 So, the python urllib2 function get_type is unable to process that empty url and throws unknown url type: 因此,python urllib2函数get_type无法处理该空url并抛出unknown url type:

I think that's the problem! 我认为这就是问题所在! Remove that empty line in the file ocean_not_templated_url.txt and try it out! 删除文件ocean_not_templated_url.txt中的空行,然后尝试一下!

Just check and let me know! 只是检查,让我知道! :P :P

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM