简体   繁体   English

python,UnicodeEncodeError,将unicode转换为ascii

[英]python, UnicodeEncodeError, converting unicode to ascii

Firstly, I am pretty new to python, so forgive me for all the n00b stuff. 首先,我对python很陌生,所以请原谅我所有的n00b内容。 So the application logic in Python goes like this: 因此,Python中的应用程序逻辑如下所示:

  1. I am sending and SQL Select to database and it returns an array of data. 我正在发送SQL Select到数据库,它返回一个数据数组。
  2. I need to take this data and use it in another SQL insert sentence. 我需要获取这些数据,并在另一个SQL插入语句中使用它。

Now the problem is, that SQL query returns me unicode strings. 现在的问题是,SQL查询返回了我unicode字符串。 The output from select is something like this: select的输出是这样的:

(u'Abc', u'Lololo', u'Fjordk\xe6r')

So first I was trying to convert it string, but it fails as the third element contains this german 'ae' letter: 因此,首先我尝试将其转换为字符串,但是由于第三个元素包含此德语“ ae”字母,所以它失败了:

for x in data[0]:
    str_data.append(str(x))

I am getting: UnicodeEncodeError: 'ascii' codec can't encode character u'\\xe6' in position 6: ordinal not in range(128) 我得到:UnicodeEncodeError:'ascii'编解码器无法在位置6编码字符u'\\ xe6':序数不在范围内(128)

I can insert unicode straightly to insert also as TypeError occurs. 我可以直接插入unicode以在TypeError发生时也插入。 TypeError: coercing to Unicode: need string or buffer, NoneType found TypeError:强制转换为Unicode:需要字符串或缓冲区,找不到NoneType

Any ideas? 有任何想法吗?

From my experiences, Python and Unicode are often a problem. 根据我的经验,Python和Unicode通常是个问题。

Generally speaking, if you have a Unicode string, you can convert it to a normal string like this: 一般来说,如果您有Unicode字符串,则可以将其转换为如下所示的普通字符串:

normal_string = unicode_string.encode('utf-8')

And convert a normal string to a Unicode string like this: 然后将普通字符串转换为Unicode字符串,如下所示:

unicode_string = normal_string.decode('utf-8')

The issue here is that str function tries to convert unicode using ascii codepage, and ascii codepage doesn't have mapping for u\\xe6 (æ - char reference here ). 这里的问题是str函数试图使用ascii代码页转换unicode,而ascii代码页没有u\\xe6映射(æ-char参考此处 )。

Therefore you need to convert it to some codepage which supports the char. 因此,您需要将其转换为支持char的某些代码页。 Nowdays the most usual is utf-8 encoding. 如今最常用的是utf-8编码。

>>> x = (u'Abc', u'Lololo', u'Fjordk\xe6r')
>>> print x[2].encode("utf8")
Fjordkær
>>> x[2].encode("utf-8")
'Fjordk\xc3\xa6r'

On the other hand you may try to convert it to cp1252 - Western latin alphabet which supports it: 另一方面,您可以尝试将其转换为cp1252-支持它的西方拉丁字母

>>> x[2].encode("cp1252")
'Fjordk\xe6r'

But Eeaster european charset cp1250 doesn't support it: 但是Eeaster欧洲字符集cp1250不支持它:

>>> x[2].encode("cp1250")
...
UnicodeEncodeError: 'charmap' codec can't encode character u'\xe6' in position 6: character maps to <undefined>

The issue with unicode in python is very common, and I would suggest following: python中unicode的问题很常见,我建议如下:

  • understand what unicode is 了解什么是unicode
  • understand what utf-8 is (it is not unicode) 了解utf-8是什么(不是unicode)
  • understand ascii and other codepages 了解ascii和其他代码页
  • recommended conversion workflow: input (any cp) -> convert to unicode -> (process) -> output to utf-8 推荐的转换工作流程:输入(任何cp)-> 转换为unicode ->(进程)->输出为utf-8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM