I'm trying to save some values in MySQL database by using Hibernate, but most Lithuanian characters won't get saved, including ąĄ čČ ęĘ ėĖ įĮ ųŲ ūŪ
(they are saved as ?
), however, šŠ žŽ
do get saved.
If I do inserts manually, then those values are properly saved, so the problem is most likely in Hibernate configuration.
What I have tried so far:
hibernate.charset=UTF-8
hibernate.character_encoding=UTF-8
hibernate.use_unicode=true
---------
properties.put(PROPERTY_NAME_HIBERNATE_USE_UNICODE,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_USE_UNICODE));
properties.put(PROPERTY_NAME_HIBERNATE_CHARSET,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARSET));
properties
.put(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING,
env.getRequiredProperty(PROPERTY_NAME_HIBERNATE_CHARACTER_ENCODING));
---------
private void registerCharachterEncodingFilter(ServletContext aContext) {
CharacterEncodingFilter cef = new CharacterEncodingFilter();
cef.setForceEncoding(true);
cef.setEncoding("UTF-8");
aContext.addFilter("charachterEncodingFilter", cef)
.addMappingForUrlPatterns(null, true, "/*");
}
As described here
I tried adding ?useUnicode=true&characterEncoding=utf-8
to db connection url.
As described here
I ensured that my db is set to UTF-8 charset. phpmyadmin > information_schema > schemata
def db_name utf8 utf8_lithuanian_ci NULL
This is how I save into db:
//Controller
buildingService.addBuildings(schema.getBuildings());
List<Building> buildings = buildingService.getBuildings();
System.out.println("-----------");
for (Building b : schema.getBuildings()) {
System.out.println(b.toString());
}
System.out.println("-----------");
for (Building b : buildings) {
System.out.println(b.toString());
}
System.out.println("-----------");
//Service:
@Override
public void addBuildings(List<Building> buildings) {
for (Building b : buildings) {
getCurrentSession().saveOrUpdate(b);
}
}
First set of println contains all Lithuanian characters, while second replaces most with ?
EDIT: Added details
insert into buildings values (11,'ąĄčČęĘ', 'asda');
select short, hex(short) from buildings;
//Šalt. was inserted via hibernate
//letters are properly displayed:
ąĄčČęĘ | C485C484C48DC48CC499C498
MIF Šalt. | 4D494620C5A0616C742E
select address, hex(address) from buildings;
Šaltini? <...> | C5A0616C74696E693F20672E2031412C2056696C6E697573
//should contain "ų"
--------
show create table buildings;
buildings | CREATE TABLE `buildings` (
`id` int(11) NOT NULL,
`short` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
`address` varchar(255) COLLATE utf8_lithuanian_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_lithuanian_ci
EDIT: I did not find a proper solution, so I came up with a workaround. I ended up escaping/unescaping characters, storing them like this: \\uXXXX
.
Let's verify that they were stored correctly... Please do SELECT col, HEX(col) ...
to fetch some cell with Lithuanian characters. A correctly stored ą
will show C485
. The others should show various hex values of C4xx or C5xx. 3F
is ?
.
But, more importantly, 4 characters do show. Š
should be C5A0
if properly stored as utf8. However, I suspect, you will see 8A
, implying that the column in the table is really declared as CHARACTER SET latin1
. (The 4 characters show up in the first column of my charset blog ).
Do SHOW CREATE TABLE
to see how the column is defined. If it says latin1
, then the problem is with the table definition, and you probably ought to start over.
You have to ensure that every component taking part in data entry uses UTF-8 encoding explicitly.
If you enter the values via browser, make sure that the page displaying the results with the following header Content-Type: text/html; charset=utf-8
Content-Type: text/html; charset=utf-8
.
The input form is defined as follows
<form action="submit" accept-charset="UTF-8">...</form>
.
If you are creating String
objects from byte array, make sure you explicitly state the Charset
in the constructor.
If your entry happens from a text file, that file has to be UTF-8 encoded.
If it is hardcoded directly in your code, then the source has to be UTF-8 encoded.
The fact that your DB holds correct UTF-8 (two or more bytes for a special letter) is reassuring.
If you get one single ?
for a special letter, it was attempted to do a UTF-8 conversion to some encoding that does not contain those letters. And that seems to be the case . The letters that are converted correctly are in the ISO-8859-1 or Windows-1252 range. The others are not. Now ISO-88591-1 aka Latin-1 is the default HTTP encoding, default in java EE server. You might like to do before writing:
response.setCharacterEncoding("UTF-8");
Now one problem with System.out.println
is that it uses the system default encoding. Logging to a file with a logger is more interesting. Or debugging and inspecting the String and its char array.
That the schema does seemingly work, may be that the schema Strings stem immediately from a Java source, and the editor encoding and javac compiler encoding differ. This can be checked by u-escaping the string literals in java: "\ą"
instead of "ą"
.
Make a unit test that writes and reads from the database.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.