Java 将 Windows-1252 转换为 UTF-8，有些字母是错误的

Question

I receive data from a external Microsoft SQL 2008 database (I make queries with MyBatis).我从外部 Microsoft SQL 2008 数据库接收数据（我使用 MyBatis 进行查询）。 The data is encoded as "Windows-1252".数据编码为“Windows-1252”。

I have tried to re-encode to UTF-8:我试图重新编码为 UTF-8：

String textoFormado = ...value from MyBatis... ; 
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

Almost the whole string is correctly decoded, but some letters with accents are not.几乎整个字符串都被正确解码，但有些带重音的字母却没有。

For example:例如：

I received this: Ã vila我收到了这个： Ã vila
The code above makes: ?vila上面的代码使得： ?vila
I expected: Ávila我期望： Ávila

Answer 1

Obviously, textoFormado is a variable of type String .显然， textoFormado是一个String类型的变量。 This means that the bytes were already decoded.这意味着字节已经被解码。 Java then internally uses a 16-bit Unicode representation.然后 Java 在内部使用 16 位 Unicode 表示。 What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding.你所做的是用 Windows-1252 编码你的字符串，然后用 UTF-8 编码读取结果字节。 That does not work.那行不通。

What you need is the correct encoding when reading the bytes:您需要的是读取字节时的正确编码：

byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");

For using this string inside your program, you do not need to do anything.要在您的程序中使用此字符串，您无需执行任何操作。 Simply use it.只需使用它。 If - however - you want to write the data back to a file for example, you need to encode again:如果 - 但是 - 例如，您想将数据写回文件，则需要再次编码：

byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here

Answer 2

I solved it thanks to all.感谢所有人，我解决了它。

I have the next project structure :我有下一个项目结构：

MyBatisQueries: I have a query with a "select" which gives me the String MyBatisQueries：我有一个带有“选择”的查询，它给了我字符串
Pojo to save the String (which gave me the String with conversion problems) Pojo 保存字符串（这给了我转换问题的字符串）
The class which uses the query and the Pojo object with data (that showed me bad decoded)使用查询和带有数据的 Pojo 对象的类（显示我解码不好）

at first I had (MyBatis and Spring inject dependencies and params):起初我有（MyBatis 和 Spring 注入依赖项和参数）：

public class Pojo {
    private String params;
    public void setParams(String params) {
        try {
            this.params = params;
        }
    }

}

The solution:解决方案：

public class Pojo {
    private String params;
    public void setParams(byte[] params) {
        try {
            this.params = new String(params, "UTF-8");
        } catch (UnsupportedEncodingException e) {
            this.params = null;
        }
    }

}

Answer 3

Why not tackling the issue at a lower level: reading the String in proper encoding from your database.为什么不在较低级别解决这个问题：从数据库中读取正确编码的字符串。

Most JDBC connection-string or URIs support the property characterEncoding .大多数 JDBC连接字符串或 URI 都支持属性characterEncoding 。

So in you Microsoft SQL Server case you could have for example jdbc:sqlserver://localhost:52865;databaseName=myDb?characterEncoding=utf8 .因此，在您的 Microsoft SQL Server 案例中，您可以拥有例如jdbc:sqlserver://localhost:52865;databaseName=myDb?characterEncoding=utf8 。

Then each String column should be read in the specified encoding without the need to (re-)convert it manually to it.然后应该以指定的编码读取每个字符串列，而无需手动（重新）将其转换为它。

See also:也可以看看：

Java 将 Windows-1252 转换为 UTF-8，有些字母是错误的

问题描述

3 个解决方案

解决方案1
10 2014-04-15 11:44:50

解决方案2
0 已采纳 2014-04-21 09:53:33

解决方案3
0 2021-07-14 18:02:15

Java 将 Windows-1252 转换为 UTF-8，有些字母是错误的

问题描述

3 个解决方案

解决方案1 10 2014-04-15 11:44:50

解决方案2 0 已采纳 2014-04-21 09:53:33

解决方案3 0 2021-07-14 18:02:15

解决方案1
10 2014-04-15 11:44:50

解决方案2
0 已采纳 2014-04-21 09:53:33

解决方案3
0 2021-07-14 18:02:15