简体   繁体   中英

PostgreSQL inserts question marks instead of unicode characters

Whenever an insertion occurs through my application, all Unicode characters (be it Japanese, Greek etc.) are replaced by question marks.

SAVEPOINT "DAO"
LOG:  execute <unnamed>: insert into foo values ($1,$2,$3)
DETAIL:  parameters: $1 = '23', $2 = '34bcb5f2-e7ee-40cf-9103-f2d1bf2ac7acd853d7c6-1703-44d2-aa99-6fd1df84da37', $3 = 'Anyone-日本語_l'

As you can see from the above log entry, the database accepts the correct Unicode parameters.

However, after the insertion, the table entry looks like this:

23 | 34bcb5f2-e7ee-40cf-9103-f2d1bf2ac7acd853d7c6-1703-44d2-aa99-6fd1df84da37 | Anyone-???_l

My first guess was that this was a database configuration issue, however I have confirmed (to the best of my knowledge) that Postgres is indeed accepting UTF-8 by performing the following:

SHOW server_encoding;
server_encoding
-----------------
UTF8
(1 row)

SHOW client_encoding;
client_encoding
-----------------
UTF8
(1 row)

I have also further cornfirmed this by manually inserting an entry to the database:

INSERT INTO foo values(25, ‘the_id’, ‘ΑΒΓΔΕΖΗΘ’);
INSERT 0 1
25 | the_id | ΑΒΓΔΕΖΗΘ

As you can understand from the above, the database has accepted my values and has succesfully added the Unicode characters to the database.

At this point, I believe that the problem occurs when these values are pushed from my application to the JDBC connector and into the database. I thought that perhaps the JDBC connector needs to be told it will be transfering Unicode data. There is indeed a way to do this, by appending the following in the JDBC connector's URL:

jdbc:postgresql://localhost/bar?useUnicode=yes&characterEncoding=UTF-8

Unfortunately, the above did not make any difference.

I have excluded the application's code as it is part of a very big project and the relevant pieces are fragmented here and there. However, I think that they are irrelevant to the problem as the Postgres log clearly displays the parameters received by it.

The query and the unicode data that is received by the database are correct so what is causing this problem?

OS: RHEL 6.6
Postgres version: 9.3.5
JDBC Connector: Tried a couple (8.1, 9.3)
JRE: 1.7

The database is indeed expecting UTF-8:

psql -U postgres -h localhost --list

Name      |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
----------------+----------+----------+-------------+-------------+--------------
bar       | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |

The bytea result of the relevant entry is the following:

SELECT foo_name::bytea FROM foo;

foo_name
--------------------------
\x416e796f6e652d3f3f3f5f6c

Question marks have actually been inserted into the database:

SELECT * FROM foo WHERE foo_name LIKE 'Anyone-?%'
23 | 34bcb5f2-e7ee-40cf-9103-f2d1bf2ac7acd853d7c6-1703-44d2-aa99-6fd1df84da37 | Anyone-???_l

I have also grabbed the byte sequence of one of my tests that was generated by the JDBC controller before it was fed to the PGStream .

{65, 110, 121, 111, 110, 101, 45, -26, -105, -91, -26, -100, -84, -24, -86, -98, 95, 105}

I have converted this to a UTF-8 String by performing the following (in a stand-alone application):

String result = new String(bytes, StandardCharsets.UTF_8);

The result was the correct one: Anyone-日本語_i

After investigating deep down the legacy code I found and fixed the issue.

The database layer worked just fine; the problem occured when the system attempted to re-insert the same value into the database by using a ByteArrayInputStream .

The ByteArrayInputStream was being populated by performing a getBytes() in the String that contained the foo_name . However, UTF-8 encoding should be defined when calling this method.

By changing:

String name = "日本語";
InputStream is = new ByteArrayInputStream(name.getBytes());

to:

String name = "日本語";
InputStream is = new ByteArrayInputStream(name.getBytes(StandardCharsets.UTF_8));

the issue was fixed.

I had the problem with unicode-8, with postgres and glassfish. I tried this in the persistence.xml and it fixed. I hope that it can help you

<properties>
  <property name="javax.persistence.jdbc.url"
           value="jdbc:postgresql://(url_Project)?useUnicode=yes"/>
</properties>

(url_Proyecto) is the complete url of the data base

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM