简体   繁体   English

Java 将 Windows-1252 转换为 UTF-8,有些字母是错误的

[英]Java convert Windows-1252 to UTF-8, some letters are wrong

I receive data from a external Microsoft SQL 2008 database (I make queries with MyBatis).我从外部 Microsoft SQL 2008 数据库接收数据(我使用 MyBatis 进行查询)。 The data is encoded as "Windows-1252".数据编码为“Windows-1252”。

I have tried to re-encode to UTF-8:我试图重新编码为 UTF-8:

String textoFormado = ...value from MyBatis... ; 
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

Almost the whole string is correctly decoded, but some letters with accents are not.几乎整个字符串都被正确解码,但有些带重音的字母却没有。

For example:例如:

  1. I received this: Ã vila我收到了这个: Ã vila
  2. The code above makes: ?vila上面的代码使得: ?vila
  3. I expected: Ávila我期望: Ávila

Obviously, textoFormado is a variable of type String .显然, textoFormado是一个String类型的变量。 This means that the bytes were already decoded.这意味着字节已经被解码。 Java then internally uses a 16-bit Unicode representation.然后 Java 在内部使用 16 位 Unicode 表示。 What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding.你所做的是用 Windows-1252 编码你的字符串,然后用 UTF-8 编码读取结果字节。 That does not work.那行不通。

What you need is the correct encoding when reading the bytes:您需要的是读取字节时的正确编码:

byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");

For using this string inside your program, you do not need to do anything.要在您的程序中使用此字符串,您无需执行任何操作。 Simply use it.只需使用它。 If - however - you want to write the data back to a file for example, you need to encode again:如果 - 但是 - 例如,您想将数据写回文件,则需要再次编码:

byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here

I solved it thanks to all.感谢所有人,我解决了它。

I have the next project structure :我有下一个项目结构

  • MyBatisQueries: I have a query with a "select" which gives me the String MyBatisQueries:我有一个带有“选择”的查询,它给了我字符串
  • Pojo to save the String (which gave me the String with conversion problems) Pojo 保存字符串(这给了我转换问题的字符串)
  • The class which uses the query and the Pojo object with data (that showed me bad decoded)使用查询和带有数据的 Pojo 对象的类(显示我解码不好)

at first I had (MyBatis and Spring inject dependencies and params):起初我有(MyBatis 和 Spring 注入依赖项和参数):

public class Pojo {
    private String params;
    public void setParams(String params) {
        try {
            this.params = params;
        }
    }

}

The solution:解决方案:

public class Pojo {
    private String params;
    public void setParams(byte[] params) {
        try {
            this.params = new String(params, "UTF-8");
        } catch (UnsupportedEncodingException e) {
            this.params = null;
        }
    }

}

Why not tackling the issue at a lower level: reading the String in proper encoding from your database.为什么不在较低级别解决这个问题:从数据库中读取正确编码的字符串。

Most JDBC connection-string or URIs support the property characterEncoding .大多数 JDBC连接字符串或 URI 都支持属性characterEncoding

So in you Microsoft SQL Server case you could have for example jdbc:sqlserver://localhost:52865;databaseName=myDb?characterEncoding=utf8 .因此,在您的 Microsoft SQL Server 案例中,您可以拥有例如jdbc:sqlserver://localhost:52865;databaseName=myDb?characterEncoding=utf8

Then each String column should be read in the specified encoding without the need to (re-)convert it manually to it.然后应该以指定的编码读取每个字符串列,而无需手动(重新)将其转换为它。

See also:也可以看看:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM