简体   繁体   English

Python:如何处理 UnicodeEncodeError?

[英]Python: How to handle UnicodeEncodeError?

This is what I am seeing:这就是我所看到的:

Traceback (most recent call last):
  File "/home/user/tools/executeJobs.py", line 86, in <module>
    owner = re.sub('^(AS[0-9]+ )', '', str(element[2]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 13: ordinal not in range(128)

In the error line you already see the line in question.在错误行中,您已经看到有问题的行。 str(array[0]) never failed me before. str(array[0])从来没有让我失望过。 How to work around this?如何解决这个问题? A quick and dirty solution is fine.快速而肮脏的解决方案很好。

Update:更新:

Element[2] comes from this binary .dat list:http://github.com/maxmind/geoip-api-php/blob/master/tests/data/ … also avail here: http://dev.maxmind.com/geoip/legacy/geolite (The IP/ASN one at the bottom of the table)元素 [2] 来自这个二进制 .dat 列表:http ://github.com/maxmind/geoip-api-php/blob/master/tests/data/……也可以在这里使用: http : //dev.maxmind.com /geoip/legacy/geolite (表格底部的 IP/ASN)

\\xe7 appears to be the circumflex c ç in latin1 encoding \\xe7 似乎是 latin1 编码中的抑扬符 c ç

so assuming you have a unicode string u"\\xe7".encode("latin1") should give you the bytestring "\\xe7" , you could also choose to encode it as "utf8" u"\\xe7".encode("utf8") would give you the bytestring "\\xc3\\xa7" ... that may or may not fix your issues however.所以假设你有一个 unicode 字符串u"\\xe7".encode("latin1")应该给你字节"\\xe7" ,你也可以选择将它编码为 "utf8" u"\\xe7".encode("utf8")会给你字节"\\xc3\\xa7" ... 然而,这可能会也可能不会解决你的问题。 but it will definately give you a different error但它肯定会给你一个不同的错误

for a quick and dirty solution快速而肮脏的解决方案

try:
    owner = re.sub('^(AS[0-9]+ )', '', element[2])
except TypeError as e:
    print "Weird:",element

I've always used我一直用

s.replace(u'\xa0',' ')

In your case, it should look something like在你的情况下,它应该看起来像

s.replace(u'\xe7','whatever')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM