简体   繁体   English

字符串十六进制编码和解码

[英]String Hex Encoding and Decoding

I am converting a String from UTF-8 to CP1047 and then performing hex encoding on it, which works great. 我将String从UTF-8转换为CP1047,然后对其执行十六进制编码,效果很好。 Next what I am doing is converting back, using decoding the hex String and displaying it on console in UTF-8 format. 接下来,我要做的是转换回去,使用十六进制String解码并将其以UTF-8格式显示在控制台上。 Problem is I am not getting the proper String what I passed to encoding method. 问题是我没有传递给编码方法的正确字符串。 Below is the piece of code I coded: 以下是我编写的代码:

public class HexEncodeDecode {

    public static void main(String[] args) throws UnsupportedEncodingException,
            DecoderException {
        String reqMsg = "ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0    000123450041234";
        char[] hexed = getHex(reqMsg, "UTF-8", "Cp1047");

        System.out.println(hexed);

        System.out.println(getString(hexed));
    }

    public static char[] getHex(String source, String inputCharacterCoding,
            String outputCharacterCoding) throws UnsupportedEncodingException {
        return Hex.encodeHex(new String(source.getBytes(inputCharacterCoding),
                outputCharacterCoding).getBytes(), false);
    }

    public static String getString(char[] source) throws DecoderException,
            UnsupportedEncodingException {
        return new String(Hex.decodeHex(source), Charset.forName("UTF-8"));
    }
}

Output I am getting is : 我得到的输出是:

    C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ñë|äâ

So, need help in printing the input String back. 因此,在打印回输入字符串时需要帮助。

Expected output would be: 预期输出为:

C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0    000123450041234
new String(source.getBytes(inputCharacterCoding), outputCharacterCoding)
    .getBytes()

This probably does not do what you think it does. 这可能不符合您的想法。

First things first: a String has no encoding . 首先, String没有编码 Repeat after me: a String has no encoding . 在我之后重复: String没有编码

A String is simply a sequence of tokens which aim to represent characters. String只是旨在表示字符的令牌序列。 It just happens that for this purpose Java uses a sequence of char s. 为此,Java恰好使用了char序列。 They could just as well be carrier pigeons. 他们也可能是信鸽。

UTF8, CP1047 and others are just character codings; UTF8,CP1047和其他仅仅是字符编码。 two operations can be performed: 可以执行两个操作:

  • encoding: turn a stream of carrier pigeons ( char s) into a stream of bytes; 编码:将信鸽( char )流转换为字节流;
  • decoding: turn a stream of bytes into a stream of carrier pigeons ( char s). 解码:将字节流转换为载体鸽子( char )流。

Basically, your base assumption is wrong; 基本上,您的基本假设是错误的; you cannot associate an encoding with a String . 您不能将编码与String关联。 Your real input should be a byte stream (more often than not a byte array) which you know is the result of a particular encoding (in your case, UTF-8), which you want to re-encode using another charset (in your case, CP1047). 您的实际输入应该是一个byte流(通常不是字节数组),您知道它是特定编码 (在您的情况下为UTF-8)的结果,您想使用另一个字符集(在您的情况,CP1047)。

The "secret" behing a real answer here would be the code of your Hex.encodeHex() method but you don't show it, so this is as good an answer that I can muster. 作为真正答案的“秘密”将是您的Hex.encodeHex()方法的代码,但是您没有显示它,因此这是我可以推荐的一个很好的答案。

A quick fix (though a little ugly) would be to change getString() to: 一个快速的解决方法(尽管有点难看)是将getString()更改为:

public static String getString(char[] source) throws DecoderException, UnsupportedEncodingException {
        return new String(new String(Hex.decodeHex(source), Charset.forName("UTF-8")).getBytes("Cp1047"),"UTF-8");
}

As fge already mentioned, you switch transforming between chars and bytes, which are different pairs of shoes. 正如fge已经提到的,您可以在char和byte之间切换,这是不同的鞋子。 So in this quick solution you first get your hex decode assuming UTF-8, then encoding it to a Cp1047 byte array and finally, decode it back to a String by using the UTF-8 charset. 因此,在此快速解决方案中,您首先使用UTF-8进行十六进制解码,然后将其编码为Cp1047字节数组,最后使用UTF-8字符集将其解码回String。

As I already said, this is simply a quick one-liner workaround and not the cleanest solution, as the error is already done during the hex encoding. 正如我已经说过的那样,这只是一种快速的单行解决方法,而不是最干净的解决方案,因为错误已在十六进制编码期间完成。

reqMsg no longer has an encoding so it's pointless (and damaging) to try to convert if from UTF-8 to "Cp1047". reqMsg不再具有编码,因此尝试将其从UTF-8转换为“ Cp1047”是毫无意义的(并且有害)。

If reqMsg is going to be coming from an external source in the future, such as from disk or network, then you will have to decode - perhaps this is where the confusion comes from. 如果reqMsg将来将来自外部源(例如来自磁盘或网络),则您将不得不解码-也许这就是混乱的来源。 Perhaps you'll being doing: UTF-8->Unicode(String)->CP1047->HEX. 也许您会这样做:UTF-8-> Unicode(String)-> CP1047-> HEX。 When you write it to stdout, the HEX will likely to be ASCII encoded. 当您将其写入stdout时,十六进制可能会被ASCII编码。

The follow example creates an ASCII hex string based on your original string after conversion to CP1047 (Unicode->CP1047->HEX): 以下示例在转换为CP1047(Unicode-> CP1047-> HEX)之后,根据您的原始字符串创建ASCII十六进制字符串:

    String reqMsg = "ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0    000123450041234";

    // encode to cp1047 represented as Hex
    byte[] reqMsqBytes = reqMsg.getBytes("Cp1047");
    char[] hex = Hex.encodeHex(reqMsqBytes);   
    System.out.println(hex);

    // decode
    String respMsqBytes = new String(Hex.decodeHex(hex), "Cp1047");
    System.out.println(respMsqBytes);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM