简体   繁体   English

在Java中将iso8859-1转换为utf8

[英]Convert iso8859-1 to utf8 in java

For an iso8859-1 encoded String s, what is the most elegant way to convert it to utf8? 对于iso8859-1编码的String,将其转换为utf8的最优雅的方法是什么?

String convertedString = new String(s.getBytes("UTF-8"), "UTF-8"); //is this correct, elegant etc?

NOTE I know that there are already questions similar to this one, but they ones I've found have ambiguous answers and do not show the whole conversion. 注意我知道已经有与此问题类似的问题,但是我发现的问题答案模棱两可,并且无法显示整个转化过程。

EDIT: more detalied description of my problem 编辑:我的问题更详细的描述

//message is a String
//msg.setContent is this method http://docs.oracle.com/javaee/6/api/javax/mail/internet/MimeMessage.html#setContent%28java.lang.Object,%20java.lang.String%29

msg.setContent(message, "text/plain"); 
msg.addHeader("Content-Type", "text/plain; charset=\"utf-8\"");

When this is received in a mail client, the header says utf8 but the content (ie the message String) is actually iso8859-1 encoded, which leads to characters such as åäö being incorrectly rendered. 在邮件客户端中接收到此消息时,标头显示utf8,但是内容(即消息字符串)实际上是经过iso8859-1编码的,这导致诸如åäö之类的字符无法正确呈现。 What I'd like to know is how to make the contents utf8 encoded. 我想知道的是如何对utf8内容进行编码。

EDIT II: (answer) Turns out it was the MimeMessage.java class that set the encoding to iso8859-1 and instead of using MimeMessage.setContent there is another method MimeMessage.setText(String text, String charset); 编辑II :(答案)原来是将编码设置为iso8859-1的MimeMessage.java类,而不是使用MimeMessage.setContent而是另一种方法MimeMessage.setText(String text,String charset); which allowed me to set encoding to utf8. 这使我可以将编码设置为utf8。

You don't convert a string from one encoding to another. 您不会将字符串从一种编码转换为另一种编码。 A String is a series of char s, and that's it. String是一系列char ,仅此而已。 For what it's worth, it could be a series of carrier pigeons. 就其价值而言,它可能是一系列的信鸽。 Pigeons don't have an encoding. 鸽子没有编码。 Neither do char s. char也没有。

What you do is convert it to bytes when using a Writer . 您要做的是在使用Writer时将其转换为字节。 (or read from bytes when using a Reader ). (或使用Reader时从字节Reader )。 It is at this point that the encoding (a Charset ) matters. 在这一点上,编码( Charset )很重要。

No, it's not correct. 不,这是不正确的。 String is always in UTF-16. 字符串始终为UTF-16。 You can encode / decode only byte array. 您只能编码/解码字节数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM