简体   繁体   English

从一个数据库读取数据并用PHP插入另一个数据库时发生编码错误

[英]Encoding error when reading data from one DB and inserting to another with PHP

With PHP, I am trying to read data from Pervasive DB v9.5 and insert it to PostgreSQL 9.3 (encoding: UTF-8) on Windows 2008. I did not choose or code PervasiveDB (I am just reading data from it). 使用PHP,我试图从Pervasive DB v9.5中读取数据并将其插入Windows 2008上的PostgreSQL 9.3(编码:UTF-8)中。我没有选择PervasiveDB或对其进行编码(我只是从中读取数据)。 With ODBC I read data from Pervasive and write it to console with no problem. 使用ODBC,我可以从Pervasive读取数据并将其毫无问题地写入控制台。 However when I try to insert it to Postgre I encounter with 但是,当我尝试将其插入Postgre时遇到

Warning: pg_execute(): Query failed: ERROR:  invalid byte sequence for encoding "UTF8": 0x94 in file.php on line ..

So, I saw that Postgres did not like the string I gave. 因此,我发现Postgres不喜欢我给的字符串。

Then I use 然后我用

var_dump(iconv_get_encoding('all'));

and see that my encoding is ISO-8859-1 看到我的编码是ISO-8859-1

and modify the string with 并用修改字符串

iconv ( 'ISO-8859-1' , 'UTF-8' , $a)

Now, the error is gone. 现在,错误消失了。 However the string which reached to Postgres is not correct. 但是到达Postgres的字符串不正确。

The code I used is below. 我使用的代码如下。 And my test string is aöaçaşaıağaüaÖaÇaŞaİaĞaÜ 我的测试字符串是aöaçaşaıağaüaÖaÇaŞaİaĞaÜ

$a is the string which comes from Pervasive $ a是来自Pervasive的字符串

echo $a; 

gives aöaçaşaıağaüaÖaÇaŞaİaĞaÜ 给aöaçaşaıağaüaÖaÇaŞaİaĞaÜ

echo iconv ( 'ISO-8859-1' , 'UTF-8' , $a)

gives a┬öa┬ça┬şa┬ıa┬ğa┬üa┬Öa┬Ça┬Şa┬İa┬Ğa┬Ü 给一个┬öa┬ça┬şa┬ıa┬ğa┬üa┬Öa┬Ça┬Şa┬İa┬Ğa┬Ü

<?php
//var_dump(iconv_get_encoding('all'));

$conn = pg_connect("host=localhost port=5432 dbname=xxx user=xxx password=".$argv[1]);

$result = pg_prepare($conn, "my_query", 'SELECT * FROM func_my_deneme($1)');

$connect_string = "DRIVER={Pervasive ODBC Client Interface}; SERVERNAME=localhost; SERVERDSN=xxx;";
$pervasiveconn = odbc_connect($connect_string, 'xxx', 'xxx');

$pervasive_result = odbc_exec($pervasiveconn ,"SELECT something");

while(odbc_fetch_row($pervasive_result)){
  $a=odbc_result($pervasive_result,1);

  echo $a;

  $result = pg_execute($conn, "my_query", array(iconv ( 'ISO-8859-1' , 'UTF-8' , $a)));
}
?>

You only seem to be looking at one of the two encoding exchanges here. 您似乎只在这里看两个编码交换之一。

You have: 你有:

(pervasive's native encoding) -> (PHP string)

and

(PHP string) -> (PostgreSQL)

Of these, you're only explicitly handling the second. 其中,您仅要明确地处理第二个。 You're assuming that the data Pervasive's ODBC driver returns is in PHP's default encoding, which on your system is iso-8859-1. 您假设Pervasive的ODBC驱动程序返回的数据采用PHP的默认编码,在您的系统上为iso-8859-1。

Your tests suggest that assumption may be correct, but simply echo'ing the string isn't a good way to tell, because that introduces another encoding step: 您的测试表明假设可能是正确的,但是简单地回显字符串并不是一个好方法,因为这会引入另一个编码步骤:

(PHP string) -> (whatever decodes it for viewing)

be that a web browser, terminal or whatever. 是网络浏览器,终端还是其他。 If the viewer expects some encoding that happens to be the same as Pervasive is using it will decode the output corectly. 如果观众希望某些编码与Pervasive使用的编码相同,它将对输出进行核心解码。

Try: 尝试:

echo $a;
echo "aöaçaşaıağaüaÖaÇaŞaİaĞaÜ";

and make sure the viewer shows the same value for both. 并确保查看者为两者显示相同的值。 Make sure you edit your source file with the encoding set to iso-8859-1 , not some other encoding, so that the literal bytes of the string you paste are correct. 确保使用设置为iso-8859-1的编码(而不是其他编码)编辑源文件,以便粘贴的字符串的文字字节正确无误。

At that point you should get an error if your editor is set correctly because not all those characters are legal in iso-8859-1. 到那时,如果正确设置了编辑器,将会出现错误,因为在ISO-8859-1中并非所有这些字符都是合法的。 The first invalid one is ş . 第一个无效的是ş

So clearly what's coming from Pervasive can't be iso-8859-1. 因此,很明显,来自Pervasive的内容不可能是iso-8859-1。 To really print a latin-1 string, you can echo the escaped bytes. 要真正打印一个latin-1字符串,您可以回显转义的字节。 For example, this string: 例如,以下字符串:

aöaçaaaüaÖaÇaaaaÜ

in which all chars are legal iso-8859-1, is printed in iso-8859-1 encoding with: 其中所有字符都是合法的iso-8859-1,以iso-8859-1编码打印,并带有:

echo "a\xf6a\xe7aaa\xfca\xd6a\xc7aaaa\xdc"

Here, hex escapes are used to specify non-7-bit characters to unambiguously ensure that the encoding of the byte sequence is what you think without any confusion about text editors etc. 在这里,十六进制转义用于指定非7位字符,以明确确保字节序列的编码符合您的想法,而不会引起文本编辑器等的混淆。

Betcha that doesn't print right when you view it, because whatever's reading the input isn't decoding it as iso-8859-1. 查看时无法正确打印的Betcha,因为无论读取什么,输入都不会将其解码为iso-8859-1。


What you should be doing is looking at the bytes of the string you get from Pervasive to see what it really is. 您应该做的是查看从Pervasive获得的字符串的字节 ,看看它的真正含义。 Then determining its encoding and decoding it into utf-8, which you can then send to PostgreSQL over a client_encoding = utf-8 connection. 然后确定其编码并将其解码为utf-8,然后可以通过client_encoding = utf-8连接将其发送到PostgreSQL。 @deceze suggested bin2hex for this (I don't speak PHP, so didn't know what to suggest). @deceze为此建议使用bin2hex (我不会讲PHP,所以不知道建议什么)。 So show the output of: 所以显示输出:

echo bin2hex($a) . "\n";

Or - even better - make sure you determine from the configuration / documentation what the encoding of the data coming from Pervasive is, rather than guessing. 或者-甚至更好-确保您从配置/文档中确定来自Pervasive的数据编码是什么,而不是猜测。 Or just force it. 或者只是强迫它。

A quick look at the Pervasive documentation showed that the ODBC Driver has an encoding parameter that takes the code page ID for the desired encoding. 快速浏览Pervasive文档后发现ODBC驱动程序具有一个encoding参数,该参数采用所需编码的代码页ID。 So try: 因此,请尝试:

$connect_string = "DRIVER={Pervasive ODBC Client Interface}; SERVERNAME=localhost; SERVERDSN=xxx; encoding=65001";

(Microsoft, at least, defines 65001 as the codepage for utf-8 per this doc ). (根据本文档 ,Microsoft至少将65001定义为utf-8的代码页)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM