简体   繁体   中英

pymssql: Setting charset option in DB connection causes connection to fail

I've written a wrapper around pymssql to connect to the DBs where I work. I've run into unicode decode/encode errors, and I'm trying to stem them at the source.

When I specify charset='latin1' or 'iso-8859-1'`, the Connection fails with the following error:

  File "pymssql.pyx", line 549, in pymssql.connect (pymssql.c:7672)
    raise OperationalError(e[0])
pymssql.OperationalError: (20017, 'DB-Lib error message 20017, severity 9:\nUnexpected EOF from the server\nDB-Lib error message 20002, severity 9:\nAdaptive Server connection failed\n')

The DB encoding looks to be 'latin1':

SELECT SERVERPROPERTY('Collation')

returns

SQL_Latin1_General_CP1_CI_AS

which, I assume, is the same as Python's 'latin1' .

Am I doing this correctly? Did I choose the wrong coded (ie, latin1 or iso-8859-1 ?

My system uses "SQL_Latin1_General_CP1_CI_AS" collation setting as well, and I found even connecting with "LATIN1", characters in CHAR/VARCHAR columns are still being returned malencoded.

According to Microsoft document on SQL Server Code Page Architecture , the code page to use is Windows-1252.

Using charset='WINDOWS-1252' in pymssql.connect gives the correct result for me.

It appears that it is quite picky about what you enter.

Consider entering charset="ISO-8859-1"

Use uppercase letters such as "ISO-8859-1" or "LATIN1".

pymssql is using the GNU iconv conventions. https://www.gnu.org/software/libiconv/

For historical reasons, international text is often encoded using a language or country dependent character encoding. With the advent of the internet and the frequent exchange of text across countries - even the viewing of a web page from a foreign country is a "text exchange" in this context -, conversions between these encodings have become important. They have also become a problem, because many characters which are present in one encoding are absent in many other encodings. To solve this mess, the Unicode encoding has been created. It is a super-encoding of all others and is therefore the default encoding for new text formats like XML.

Still, many computers still operate in locale with a traditional (limited) character encoding. Some programs, like mailers and web browsers, must be able to convert between a given text encoding and the user's encoding. Other programs internally store strings in Unicode, to facilitate internal processing, and need to convert between internal string representation (Unicode) and external string representation (a traditional encoding) when they are doing I/O. GNU libiconv is a conversion library for both kinds of applications.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM