简体   繁体   English

摆脱unicode错误

[英]Get rid of unicode error

I have the following code attempting to print the edge lists of graphs. 我有以下代码尝试打印图形的边缘列表。 It looks like the edges are cycled but it's my intention to test whether all edges are contained while going through the function for further processing. 看起来边缘是循环的,但我打算在通过函数进行进一步处理时测试是否包含所有边缘。

def mapper_network(self, _, info):
    info[0] = info[0].encode('utf-8')
    for i in range(len(info[1])):
        info[1][i] = str(info[1][i])
    l_lst = len(info[1])
    packed = [(info[0], l) for l in info[1]] #each pair of nodes (edge)
    weight = [1 /float(l_lst)] #each edge weight
    G = nx.Graph()
    for i in range(len(packed)):
        edge_from = packed[i][0]
        edge_to = packed[i][1]
        #edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
        edge_to = edge_to.encode("utf-8")
        weight = weight
        G.add_edge(edge_from, edge_to, weight=weight)
    #print G.size()  #yes, this works :)
    G_edgelist = []
    G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next())
    print G_edgelist

With this code, I obtain the error 有了这段代码,我得到了错误

Traceback (most recent call last):
File "MRQ7_trevor_2.py", line 160, in <module>
MRMostUsedWord2.run()
File  "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 433, in run
mr_job.execute()
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 442, in execute
self.run_mapper(self.options.step_num)
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 507, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "MRQ7_trevor_2.py", line 91, in mapper_network
G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next())
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/readwrite/edgelist.py", line 114, in generate_edgelist
yield delimiter.join(map(make_str,e))
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/utils/misc.py", line 82, in make_str
return unicode(str(x), 'unicode-escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 0: \ at end of string

With the modification below 进行以下修改

edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')  

I obtained 我获得了

edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
TypeError: must be unicode, not str

How to get rid of the error of unicode? 如何摆脱unicode的错误? It seems very troublesome and I highly appreciate your assistance. 这似乎很麻烦,非常感谢您的协助。 Thank you!! 谢谢!!

I highly recommend reading this article on unicode . 我强烈建议您阅读有关unicode的文章 It gives a nice explanation of unicode vs. strings in Python 2. 它很好地解释了Python 2中的unicode与字符串。

For your problem specifically, when you call unicodedata.normalize("NFKD", edge_to) , edge_to must be a unicode string. 具体来说,对于您的问题,当您调用unicodedata.normalize("NFKD", edge_to)edge_to必须是Unicode字符串。 However, it is not unicode since you set it in this line: info[1][i] = str(info[1][i]) . 但是,它不是unicode,因为您在以下行中对其进行了设置: info[1][i] = str(info[1][i]) Here's a quick test: 这是一个快速测试:

import unicodedata

edge_to = u'edge'  # this is unicode
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
print edge_to  # prints 'edge' as expected

edge_to = 'edge'  # this is not unicode
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore')
print edge_to  # TypeError: must be unicode, not str

You can get rid of the problem by casting edge_to to unicode. 您可以通过将edge_to强制edge_to为unicode来解决此问题。

As an aside, it seems like the encoding/decoding of the whole code chunk is a little confusing. 顺便说一句,似乎整个代码块的编码/解码有些混乱。 Think out exactly where you want strings to be unicode vs. bytes. 仔细想想您希望字符串在哪里成为unicode与字节。 You may not need to be doing so much encoding/decoding/normalization. 您可能不需要做太多的编码/解码/标准化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM