简体   繁体   English

使用 psql 插入 postgres db 时的特殊字符

[英]Special characters on insert to postgres db with psql

I try to add special char "†" with psql to varchar field, but no success.我尝试将特殊字符“†”与 psql 添加到 varchar 字段,但没有成功。 From php application it works (php user as iso8859-1).从 php 应用程序它可以工作(php 用户为 iso8859-1)。

setting for db are: db 的设置是:

encoding = LATIN1
collation = fi_FI
character type = fi_FI
client both UTF8 & LATIN1 (on commandline PGCLIENTENCODING=LATIN1 or PGCLIENTENCODING=UTF8)

selecting from table shows when client is UTF8从表中选择显示客户端何时为 UTF8

locationx \u0086

How to add value from psql to database?如何将值从 psql 添加到数据库? Neither below are not working.以下都不起作用。

update tablex set field1 = 'locationY' || '†'
update tablex set field1 = 'locationY' || U&'\86'

giving error messages.给出错误信息。

ERROR:  character with byte sequence 0xe2 0x80 0xa0 in encoding "UTF8" has no equivalent in encoding "LATIN1"
ERROR:  invalid Unicode escape value at or near "\86' "

If I view the data entered by my PHP application, the bytes are \\x6c6f636174696f6e5986 , but when I enter the data with psql , the bytes are \\x6c6f636174696f6e59e280a0 .如果我查看 PHP 应用程序输入的数据,则字节为\\x6c6f636174696f6e5986 ,但是当我使用psql输入数据时,字节为\\x6c6f636174696f6e59e280a0

It doesn't work from either PHP or psql , because the character does not exist in LATIN-1 encoding.它不适用于 PHP 或psql ,因为字符在 LATIN-1 编码中不存在。 You just cannot store it in the database.您只是无法将其存储在数据库中。

Let me explain what is going on.让我解释一下发生了什么。

  • If your client encoding is LATIN1 and you enter in psql :如果您的客户端编码是LATIN1并且您输入psql

     INSERT INTO ... VALUES ('locationY†');

    gets stored successfully, because your terminal is set to UTF-8.成功存储,因为您的终端设置为 UTF-8。 So the you type is actually three bytes: \\xE280A0 , which are interpreted and stored as three single-byte characters.因此,您键入的实际上是三个字节: \\xE280A0 ,它们被解释并存储为三个单字节字符。

  • If your client encoding is UTF8 and you enter in psql :如果您的客户端编码是UTF8并且您输入psql

    The same insert will cause an error, because the three bytes that are entered when you type will correctly be interpreted as the dagger character, and there will be an error when PostgreSQL tries to convert the character to LATIN :同样的insert会报错,因为输入时输入的三个字节会被正确解释为dagger字符,而PostgreSQL尝试将字符转换为LATIN时会报错:

     ERROR: character with byte sequence 0xe2 0x80 0xa0 in encoding "UTF8" has no equivalent in encoding "LATIN1"
  • With PHP, your client encoding is probably set to LATIN1 , and the PHP program actually uses the WINDOWS-1252 encoding.使用 PHP,您的客户端编码可能设置为LATIN1 ,而 PHP 程序实际上使用的是 WINDOWS-1252 编码。

    Then is represented by the single byte \\x86 .然后由单字节\\x86 That is interpreted by PostgreSQL in the LATIN1 encoding, where it means something entirely different, namely the “start of selected area” control character U+0086 .这是由 PostgreSQL 在LATIN1编码中解释的,它意味着完全不同的东西,即“所选区域的开始”控制字符U+0086

    Now when your PHP program reads that character back, everything seems to work fine, but the database actually stores a different character than you intend.现在,当您的 PHP 程序读回该字符时,一切似乎都正常工作,但数据库实际上存储的字符与您预期的不同。

    You will notice that as soon as you try to select the value by any other means, eg on your psql console.您会注意到,一旦您尝试通过任何其他方式选择该值,例如在您的psql控制台上。 There the value will be rendered as那里的值将呈现为

    locationY\†

Here is a solution how to get things working:这是如何使事情正常工作的解决方案:

  • Create a new database with UTF8 encoding.使用UTF8编码创建一个新数据库。

  • Dump the old database with转储旧数据库

    pg_dump -F p -E LATIN1 dbname
  • Manually edit the dump and change the line手动编辑转储并更改行

    SET client_encoding = 'LATIN1';

    to

    SET client_encoding = 'WIN1252';
  • Load the dump into the new database with psql .使用psql将转储加载到新数据库中。

  • change the client_encoding of your PHP application to WIN1252 and start using the new database.将 PHP 应用程序的client_encoding更改为WIN1252并开始使用新数据库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM