[英]Python pyodbc Unicode issue
I have a string variable res which I have derived from a pyodbc cursor as shown in the bottom. 我有一个从pyodbc游标派生的字符串变量res,如底部所示。 The table
test
has a single row with data ä
whose unicode codepoint is u'\\xe4'
. 表
test
只有一行包含数据ä
的unicode码点是u'\\xe4'
。
The Result I get is 我得到的结果是
>>> res,type(res)
('\xe4', <type 'str'>)
Whereas the result I should have got is. 而我应该得到的结果是。
>>> res,type(res)
(u'\xe4', <type 'unicode'>)
I tried adding charset as utf-8 to my pyodbc connect string as shown below. 我尝试将charset作为utf-8添加到我的pyodbc连接字符串中,如下所示。 The result was now correctly set as a unicode but the codepoint was for someother string
꓃
which could be due to a possible bug in the pyodbc driver. 现在已将结果正确设置为unicode,但代码点是用于其他字符串
꓃
,这可能是由于pyodbc驱动程序中的错误所致。
conn = pyodbc.connect(DSN='datbase;charset=utf8',ansi=True,autocommit=True)
>>> res,type(res)
(u'\ua4c3', <type 'unicode'>)
Actual code 实际代码
import pyodbc
pyodbc.pooling=False
conn = pyodbc.connect(DSN='datbase',ansi=True,autocommit=True)
cursor = conn.cursor()
cur = cursor.execute('SELECT col1 from test')
res = cur.fetchall()[0][0]
print(res)
Additional details Database: Teradata pyodbc version: 2.7 其他详细信息数据库:Teradata pyodbc版本:2.7
So How do I now either 那我现在该怎么办
1) cast ('\\xe4', <type 'str'>)
to (u'\\xe4', <type 'unicode'>)
(is it possible to do this without unintentional side-effects?) 1)将
('\\xe4', <type 'str'>)
(u'\\xe4', <type 'unicode'>)
为(u'\\xe4', <type 'unicode'>)
(是否可以在没有意外副作用的情况下做到这一点?)
2) resolve the pyodbc/unixodbc issue 2)解决pyodbc / unixodbc问题
This is something I think is best handled with Python, instead of fiddling with pyodbc.connect arguments and driver-specific connection string attributes. 我认为这是最好用Python处理的,而不是摆弄pyodbc.connect参数和特定于驱动程序的连接字符串属性。
'\\xe4'
is a Latin-1 encoded string representing the unicode ä character. '\\xe4'
是表示'\\xe4'
码ä字符的Latin-1编码字符串。
To explicitly decode the pyodbc result in Python 2.7: 要在Python 2.7中显式解码pyodbc结果:
>>> res = '\xe4'
>>> res.decode('latin1'), type(res.decode('latin1'))
(u'\xe4', <type 'unicode'>)
>>> print res.decode('latin1')
ä
Python 3.x does this for you (the str
type includes unicode characters ): Python 3.x为您做到了这一点(
str
类型包括unicode字符 ):
>>> res = '\xe4'
>>> res, type(res)
('ä', <class 'str'>)
>>> print(res)
ä
For Python 3, try this: 对于Python 3,请尝试以下操作:
After conn = pyodbc.connect(DSN='datbase',ansi=True,autocommit=True)
在
conn = pyodbc.connect(DSN='datbase',ansi=True,autocommit=True)
Place this: 放置:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf8') conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf8') conn.setencoding(encoding='utf8')
or 要么
conn.setdecoding(pyodbc.SQL_CHAR, encoding='iso-8859-1') conn.setdecoding(pyodbc.SQL_WCHAR, encoding='iso-8859-1') conn.setencoding(encoding='iso-8859-1')
etc... 等等...
Python 2: Python 2:
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8') cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8') cnxn.setencoding(str, encoding='utf-8') cnxn.setencoding(unicode, encoding='utf-8')
etc... 等等...
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='encode-foo-bar') cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='encode-foo-bar') cnxn.setencoding(str, encoding='encode-foo-bar') cnxn.setencoding(unicode, encoding='encode-foo-bar')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.