简体   繁体   English

windows-1252到UTF-8

[英]windows-1252 to UTF-8

Below is the code I am trying to use, and the output it's giving me is: 下面是我试图使用的代码,它给我的输出是:

RetValue: á, é, í, ó, ú, ü, ñ, ¿ Value: á, é, í, ó, ú, ü, ñ, ¿ ConvertValue: ?, ?, ?, ?, ?, ?, ?, ?

which is not the desired output. 这不是所需的输出。 I think the output should be something of this kind %C3% for every character here. 我认为这里的每个角色的输出应该是这种%C3%。

public static void main(String[] args) {
    String value = "á, é, í, ó, ú, ü, ñ, ¿";
    String retValue = "";
    String convertValue = "";
    try {
        retValue = new String(value.getBytes(),
        Charset.forName("Windows-1252"));
        convertValue = new String(retValue.getBytes("Windows-1252"),
        Charset.forName("UTF-8"));
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println("RetValue: " + retValue + " Value: " + value
         + " ConvertValue: " + convertValue);
}

I understand that you are trying to encode your text from default encoding to Windows-1252, then to UTF-8. 我知道您正在尝试将文本从默认编码编码为Windows-1252,然后编码为UTF-8。

According to the javadoc for the String class 根据String类的javadoc

String(byte[] bytes, Charset charset)

Constructs a new String by decoding the specified array of bytes using the specified charset. 通过使用指定的字符集解码指定的字节数组构造一个新的String。

Therefore what you did was to decode a default encoded text into Windows-1252 and then further decode the newly obtained text into UTF-8. 因此,您所做的是将默认编码文本解码为Windows-1252,然后将新获得的文本进一步解码为UTF-8。 That's why it renders something abnormal. 这就是它导致异常的原因。

If your purpose is to encode from Windows-1252 to UTF-8, I would suggest that you use the following approach with CharsetEncoder in java.nio package: 如果您的目的是从Windows-1252编码为UTF-8,我建议您在java.nio包中使用CharsetEncoder的以下方法:

public static void main(String[] args) {
    String value = "á, é, í, ó, ú, ü, ñ, ¿";
    String retValue = "";
    String convertValue2 = "";
    ByteBuffer convertedBytes = null;
    try {
        CharsetEncoder encoder2 = Charset.forName("Windows-1252").newEncoder();
        CharsetEncoder encoder3 = Charset.forName("UTF-8").newEncoder();             
        System.out.println("value = " + value);

        assert encoder2.canEncode(value);
        assert encoder3.canEncode(value);

        ByteBuffer conv1Bytes = encoder2.encode(CharBuffer.wrap(value.toCharArray()));

        retValue = new String(conv1Bytes.array(), Charset.forName("Windows-1252"));

        System.out.println("retValue = " + retValue);

        convertedBytes = encoder3.encode(CharBuffer.wrap(retValue.toCharArray()));
        convertValue2 = new String(convertedBytes.array(), Charset.forName("UTF-8"));
        System.out.println("convertedValue =" + convertValue2);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

I obtained the following output: 我获得了以下输出:

value = á, é, í, ó, ú, ü, ñ, ¿ value =á,é,í,ó,ú,ü,ñ,¿

retValue = á, é, í, ó, ú, ü, ñ, ¿ retValue =á,é,í,ó,ú,ü,ñ,¿

convertedValue =á, é, í, ó, ú, ü, ñ, ¿ convertedValue =á,é,í,ó,ú,ü,ñ,¿

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将Windows-1252 xml文件转换为UTF-8 - Convert Windows-1252 xml file to UTF-8 Java 将 Windows-1252 转换为 UTF-8,有些字母是错误的 - Java convert Windows-1252 to UTF-8, some letters are wrong 将Windows-1252文件转换为UTF-8文件 - Convert Windows-1252 file into UTF-8 file 将UTF-8转换为Windows-1252并在tomcat v7的gwt 2.7.0中写入csv - Convert UTF-8 to windows-1252 and write into csv in gwt 2.7.0 on tomcat v7 Java函数将Windows-1252编码为UTF-8,得到相同的符号 - Java functions to encode Windows-1252 to UTF-8 getting the same symbol getBytes(“ UTF-8”),getBytes(“ windows-1252”)和getBytes()有什么区别? - Whats the difference between getBytes(“UTF-8”), getBytes(“windows-1252”) and getBytes()? 转换后的 word 文档(从 Windows-1252 到 UTF-8)不能正确显示字符 - Converted word document (from Windows-1252 to UTF-8) not displaying characters correctly 将项目编码设置为UTF-8,默认字符集返回windows-1252 - Encoding for project set to UTF-8, default charset returns windows-1252 字符编码将 windows-1252 输入文件转换为 utf-8 输出文件 - Character encoding converting windows-1252 input file to utf-8 output file 从Oracle读取Windows-1252格式并写入使用UTF-8编码的Latin1字符的XML文件 - Reading from Windows-1252 format from Oracle and Writing to XML file with Latin1 characters UTF-8 encoded
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM