Python ValueError：未知的url类型：空格（？）

Question

I am using the urllib2 module in Python 2.7 using Spyder 3.0 to batch download text files by reading a text file that contains a list of them: 我正在Python 2.7中使用urllib2模块，并使用Spyder 3.0通过读取包含文本列表的文本文件来批量下载文本文件：

    reload(sys)
    sys.setdefaultencoding('utf-8')
    with open('ocean_not_templated_url.txt', 'r') as text:
        lines = text.readlines()
        for line in lines:
            url = urllib2.urlopen(line.strip('ïÃ¯Â»Â¿ \xa0\t\n\r\v'))
            with open(line.strip('\n\r\t ').replace('/', '!').replace(':', '~'), 'wb') as out:
                for d in url:
                    out.write(d)

I've already discovered a bunch of weird characters in the urls that I've since stripped, however, the script fails when nearly 90% complete, giving the following error: 我已经在剥离后的网址中发现了一堆奇怪的字符，但是，脚本在完成将近90％时失败，并给出以下错误：

I thought it to be a non-breaking space (denoted by \\xa0 in the code), but it still fails. 我认为这是一个不间断的空格（在代码中用\\ xa0表示），但仍然失败。 Any ideas? 有任何想法吗？

Answer 1

That's an odd URL! 这是一个奇怪的网址！

Specify the communication protocol over the network. 指定网络上的通信协议。 Try prefixing the URL with http:// and the domain names if the file exists on the WWW. 如果该文件存在于WWW上，则尝试给URL加上http://和域名作为前缀。

Files always reside somewhere, in some server's directory, or locally on your system. 文件始终位于某个服务器目录中的某个位置或系统中的本地位置。 So there must be a network path to such files, for example: 因此，必须有指向此类文件的网络路径，例如：

http://127.0.0.1/folder1/samuel/file1.txt

Same example, with localhost being an alias for 127.0.0.1 (generally) 同一个示例，其中localhost是127.0.0.1的别名（通常）

http://localhost/folder1/samuel/file1.txt

That might solve the problem. 那可能会解决问题。 Just think about where your file exists and how it should be addressed... 只要考虑一下您文件的位置以及应如何解决...

Update: 更新：

I experimented quite a bit on this. 我对此做了很多实验。 I think I know why that error is raised! 我想我知道为什么会出现该错误！ :D ：D

I speculate that your file which stores the URL's actually has a sneaky empty line near the end. 我推测您存储URL的文件实际上在末尾有一个偷偷摸摸的 空行。 I can say it's near the end as you said that it executes about 90% of it and then fails. 我可以说它接近尾声，正如您所说的那样，它执行了大约90％的操作，然后失败了。 So, the python urllib2 function get_type is unable to process that empty url and throws unknown url type: 因此，python urllib2函数get_type无法处理该空url并抛出unknown url type:

I think that's the problem! 我认为这就是问题所在！ Remove that empty line in the file ocean_not_templated_url.txt and try it out! 删除文件ocean_not_templated_url.txt中的空行，然后尝试一下！

Just check and let me know! 只是检查，让我知道！ :P ：P

Python ValueError：未知的url类型：空格（？）

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-02-17 19:46:52

Python ValueError：未知的url类型：空格（？）

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-02-17 19:46:52

解决方案1
1 已采纳 2017-02-17 19:46:52