简体   繁体   English

如何从latin1编码的ResultSet中以UTF-8编码字符串

[英]How to encode a string in UTF-8 from a ResultSet encoded in latin1

I'm writing an application (uses UTF-8) that need read/write to a second database of an external application (uses ISO-8859-1). 我正在编写一个需要对外部应用程序的第二个数据库(使用ISO-8859-1)进行读/写的应用程序(使用UTF-8)。

try {
    // data in latin1
    String s = rs.getString("sAddrNameF");
    System.out.println(s); // shows "Adresse d'exp�dition"
    byte[] data = s.getBytes();
    String value = new String(data, "UTF-8");
    System.out.println("data in UTF8: " + value);
    // The expected result should be "Adresse d'expédition"
} catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}

This code is not working, I also still need do the opposite conversion (writing on the database). 这段代码不起作用,我还需要做相反的转换(在数据库上写)。 If anybody know an elegant solution to dealing with different encoding in the same application please let me know, I appreciate it. 如果有人知道在同一应用程序中处理不同编码的优雅解决方案,请告诉我,我对此表示赞赏。

String s = rs.getString("sAddrNameF");
System.out.println(s); // shows "Adresse d'exp�dition"

This means that the string is either already corrupted in the database, or you're connecting to the database with the wrong encoding (such as passing characterEncoding=utf8 with MySQL). 这意味着该字符串已在数据库中损坏,或者您使用错误的编码连接到数据库(例如,通过MySQL传递characterEncoding=utf8 )。

There's no such a thing as converting String from one encoding to another. 没有将String从一种编码转换为另一种编码的事情。 Once you have a String it's always UTF-16. 一旦有了String它始终是UTF-16。

If it's just a configuration problem, you don't need to worry. 如果仅是配置问题,则无需担心。 The rs.getString() will return proper Strings and PreparedStatement.setString() will make sure Strings are properly saved in the database. rs.getString()将返回正确的字符串, PreparedStatement.setString()将确保将字符串正确保存在数据库中。

What you should know about Unicode 您应该了解的Unicode知识

We need to mension string as StandardCharsets.UTF_8 我们需要将字符串命名为StandardCharsets.UTF_8

try {
        // data in latin1
        String s = rs.getString("sAddrNameF");
        System.out.println(s); // shows "Adresse d'exp�dition"
        byte[] data = rs.getBytes("sAddrNameF");
        String value = new String(data, StandardCharsets.UTF_8);
        System.out.println("data in UTF8: " + value);

    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }

字符串值=新的String(data,“ ISO-8859-1”);

The function getBytes takes also a Charset or just string with the desired encoding. 函数getBytes也可以使用Charset或仅具有所需编码的字符串。

byte[] data = s.getBytes("UTF-8");
// or
byte[] data = s.getBytes(Charset.forName("UTF-8"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM