简体   繁体   English

pymssql:在数据库连接中设置字符集选项会导致连接失败

[英]pymssql: Setting charset option in DB connection causes connection to fail

I've written a wrapper around pymssql to connect to the DBs where I work. 我围绕pymssql编写了一个包装器,以连接到我工作的数据库。 I've run into unicode decode/encode errors, and I'm trying to stem them at the source. 我遇到了unicode解码/编码错误,并且试图从源头上阻止它们。

When I specify charset='latin1' or 'iso-8859-1'`, the Connection fails with the following error: 当我指定charset='latin1' or 'iso-8859-1'`时,连接失败,并显示以下错误:

  File "pymssql.pyx", line 549, in pymssql.connect (pymssql.c:7672)
    raise OperationalError(e[0])
pymssql.OperationalError: (20017, 'DB-Lib error message 20017, severity 9:\nUnexpected EOF from the server\nDB-Lib error message 20002, severity 9:\nAdaptive Server connection failed\n')

The DB encoding looks to be 'latin1': DB编码看起来是“ latin1”:

SELECT SERVERPROPERTY('Collation')

returns 退货

SQL_Latin1_General_CP1_CI_AS

which, I assume, is the same as Python's 'latin1' . 我假设它与Python的'latin1'

Am I doing this correctly? 我这样做正确吗? Did I choose the wrong coded (ie, latin1 or iso-8859-1 ? 我是否选择了错误的编码(即latin1iso-8859-1

My system uses "SQL_Latin1_General_CP1_CI_AS" collation setting as well, and I found even connecting with "LATIN1", characters in CHAR/VARCHAR columns are still being returned malencoded. 我的系统也使用“ SQL_Latin1_General_CP1_CI_AS”排序规则设置,而且我发现即使与“ LATIN1”连接,CHAR / VARCHAR列中的字符仍会返回malencoded。

According to Microsoft document on SQL Server Code Page Architecture , the code page to use is Windows-1252. 根据Microsoft关于SQL Server 代码页体系结构的文档,要使用的代码页为Windows-1252。

Using charset='WINDOWS-1252' in pymssql.connect gives the correct result for me. pymssql.connect使用charset='WINDOWS-1252'为我提供正确的结果。

It appears that it is quite picky about what you enter. 您输入的内容似乎有些挑剔。

Consider entering charset="ISO-8859-1" 考虑输入charset="ISO-8859-1"

Use uppercase letters such as "ISO-8859-1" or "LATIN1". 使用大写字母,例如“ ISO-8859-1”或“ LATIN1”。

pymssql is using the GNU iconv conventions. pymssql使用GNU iconv约定。 https://www.gnu.org/software/libiconv/ https://www.gnu.org/software/libiconv/

For historical reasons, international text is often encoded using a language or country dependent character encoding. 由于历史原因,国际文本通常使用与语言或国家/地区相关的字符编码进行编码。 With the advent of the internet and the frequent exchange of text across countries - even the viewing of a web page from a foreign country is a "text exchange" in this context -, conversions between these encodings have become important. 随着Internet的出现和国家之间频繁的文本交换-在这种情况下,即使从外国观看网页也是一种“文本交换”-这些编码之间的转换变得非常重要。 They have also become a problem, because many characters which are present in one encoding are absent in many other encodings. 它们也成为一个问题,因为在一种编码中存在的许多字符在许多其他编码中都不存在。 To solve this mess, the Unicode encoding has been created. 为了解决这个问题,已经创建了Unicode编码。 It is a super-encoding of all others and is therefore the default encoding for new text formats like XML. 它是所有其他语言的超级编码,因此是XML等新文本格式的默认编码。

Still, many computers still operate in locale with a traditional (limited) character encoding. 尽管如此,许多计算机仍使用传统的(有限的)字符编码在区域设置中运行。 Some programs, like mailers and web browsers, must be able to convert between a given text encoding and the user's encoding. 某些程序,例如邮件程序和Web浏览器,必须能够在给定的文本编码和用户的编码之间进行转换。 Other programs internally store strings in Unicode, to facilitate internal processing, and need to convert between internal string representation (Unicode) and external string representation (a traditional encoding) when they are doing I/O. 其他程序在内部将字符串存储为Unicode,以方便内部处理,并且在执行I / O时需要在内部字符串表示(Unicode)和外部字符串表示(传统编码)之间进行转换。 GNU libiconv is a conversion library for both kinds of applications. GNU libiconv是两种应用程序的转换库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM