[英]Java convert Windows-1252 to UTF-8, some letters are wrong
I receive data from a external Microsoft SQL 2008 database (I make queries with MyBatis).我从外部 Microsoft SQL 2008 数据库接收数据(我使用 MyBatis 进行查询)。 The data is encoded as "Windows-1252".数据编码为“Windows-1252”。
I have tried to re-encode to UTF-8:我试图重新编码为 UTF-8:
String textoFormado = ...value from MyBatis... ;
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");
Almost the whole string is correctly decoded, but some letters with accents are not.几乎整个字符串都被正确解码,但有些带重音的字母却没有。
For example:例如:
à vila
我收到了这个: Ã vila
?vila
上面的代码使得: ?vila
Ávila
我期望: Ávila
Obviously, textoFormado
is a variable of type String
.显然, textoFormado
是一个String
类型的变量。 This means that the bytes were already decoded.这意味着字节已经被解码。 Java then internally uses a 16-bit Unicode representation.然后 Java 在内部使用 16 位 Unicode 表示。 What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding.你所做的是用 Windows-1252 编码你的字符串,然后用 UTF-8 编码读取结果字节。 That does not work.那行不通。
What you need is the correct encoding when reading the bytes:您需要的是读取字节时的正确编码:
byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");
For using this string inside your program, you do not need to do anything.要在您的程序中使用此字符串,您无需执行任何操作。 Simply use it.只需使用它。 If - however - you want to write the data back to a file for example, you need to encode again:如果 - 但是 - 例如,您想将数据写回文件,则需要再次编码:
byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here
I solved it thanks to all.感谢所有人,我解决了它。
I have the next project structure :我有下一个项目结构:
at first I had (MyBatis and Spring inject dependencies and params):起初我有(MyBatis 和 Spring 注入依赖项和参数):
public class Pojo {
private String params;
public void setParams(String params) {
try {
this.params = params;
}
}
}
The solution:解决方案:
public class Pojo {
private String params;
public void setParams(byte[] params) {
try {
this.params = new String(params, "UTF-8");
} catch (UnsupportedEncodingException e) {
this.params = null;
}
}
}
Why not tackling the issue at a lower level: reading the String in proper encoding from your database.为什么不在较低级别解决这个问题:从数据库中读取正确编码的字符串。
Most JDBC connection-string or URIs support the property characterEncoding .大多数 JDBC连接字符串或 URI 都支持属性characterEncoding 。
So in you Microsoft SQL Server case you could have for example jdbc:sqlserver://localhost:52865;databaseName=myDb?characterEncoding=utf8
.因此,在您的 Microsoft SQL Server 案例中,您可以拥有例如jdbc:sqlserver://localhost:52865;databaseName=myDb?characterEncoding=utf8
。
Then each String column should be read in the specified encoding without the need to (re-)convert it manually to it.然后应该以指定的编码读取每个字符串列,而无需手动(重新)将其转换为它。
See also:也可以看看:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.