[英]python, UnicodeEncodeError, converting unicode to ascii
Firstly, I am pretty new to python, so forgive me for all the n00b stuff. 首先,我对python很陌生,所以请原谅我所有的n00b内容。 So the application logic in Python goes like this: 因此,Python中的应用程序逻辑如下所示:
Now the problem is, that SQL query returns me unicode strings. 现在的问题是,SQL查询返回了我unicode字符串。 The output from select is something like this: select的输出是这样的:
(u'Abc', u'Lololo', u'Fjordk\xe6r')
So first I was trying to convert it string, but it fails as the third element contains this german 'ae' letter: 因此,首先我尝试将其转换为字符串,但是由于第三个元素包含此德语“ ae”字母,所以它失败了:
for x in data[0]:
str_data.append(str(x))
I am getting: UnicodeEncodeError: 'ascii' codec can't encode character u'\\xe6' in position 6: ordinal not in range(128) 我得到:UnicodeEncodeError:'ascii'编解码器无法在位置6编码字符u'\\ xe6':序数不在范围内(128)
I can insert unicode straightly to insert also as TypeError occurs. 我可以直接插入unicode以在TypeError发生时也插入。 TypeError: coercing to Unicode: need string or buffer, NoneType found TypeError:强制转换为Unicode:需要字符串或缓冲区,找不到NoneType
Any ideas? 有任何想法吗?
From my experiences, Python and Unicode are often a problem. 根据我的经验,Python和Unicode通常是个问题。
Generally speaking, if you have a Unicode string, you can convert it to a normal string like this: 一般来说,如果您有Unicode字符串,则可以将其转换为如下所示的普通字符串:
normal_string = unicode_string.encode('utf-8')
And convert a normal string to a Unicode string like this: 然后将普通字符串转换为Unicode字符串,如下所示:
unicode_string = normal_string.decode('utf-8')
The issue here is that str
function tries to convert unicode using ascii
codepage, and ascii
codepage doesn't have mapping for u\\xe6
(æ - char reference here ). 这里的问题是str
函数试图使用ascii
代码页转换unicode,而ascii
代码页没有u\\xe6
映射(æ-char参考此处 )。
Therefore you need to convert it to some codepage which supports the char. 因此,您需要将其转换为支持char的某些代码页。 Nowdays the most usual is utf-8 encoding. 如今最常用的是utf-8编码。
>>> x = (u'Abc', u'Lololo', u'Fjordk\xe6r')
>>> print x[2].encode("utf8")
Fjordkær
>>> x[2].encode("utf-8")
'Fjordk\xc3\xa6r'
On the other hand you may try to convert it to cp1252 - Western latin alphabet which supports it: 另一方面,您可以尝试将其转换为cp1252-支持它的西方拉丁字母 :
>>> x[2].encode("cp1252")
'Fjordk\xe6r'
But Eeaster european charset cp1250 doesn't support it: 但是Eeaster欧洲字符集cp1250不支持它:
>>> x[2].encode("cp1250")
...
UnicodeEncodeError: 'charmap' codec can't encode character u'\xe6' in position 6: character maps to <undefined>
The issue with unicode in python is very common, and I would suggest following: python中unicode的问题很常见,我建议如下:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.