简体   繁体   中英

Java : Hex encoded bytes get decoded with base64

HEX(Base16) Encoded bytes get decoded with Base64 without throwing exception? How to distinguish whether it was encoded with base16 encoder only?

org.apache.commons.codec.binary.Base64.decodeBase64(bytesencodedwithHex);

When bytes to above method is a hex encoded data the method dose not throw any exception or help to identify it was Hex encoded. Even org.apache.commons.codec.binary.Base64.Base64.isBase64(bytesencodedwithHex) return true.

Example Below String "Hello" got encoded with Hex and when I decode with Base64 it gives some nonsense.How could I let me client know that they are using wrong decoder in this case? :

System.out.println(new 
String(org.bouncycastle.util.encoders.Hex.encode("Hello".getBytes())));  

System.out.println(new String(org.bouncycastle.util.encoders.Base64.decode("48656c6c6f".getBytes())));   

Every hexadecimal string is a legitimate Base64 string.

Hex encoding gives you a string that represents the original's string bytes, and comprised of 0-9 and AF. Base64 encoding gives you a string that encodes the original string, and comprised from only printable characters (which, of course, include 0-9,AF).

So each string made of 0-9,AF can represent a hexadecimal string, but also a Base64 string (that happens to have only 0-9,AF).

You will need a different way to tell the user the encoding that was used. An example is to send a structure of encoding type together with the string, or send the original's string's length (so if after the decoding you get a wrong length- this was not the right encoding mode).

There are strings that are either base 64 or base 16, without any clue.

But there are clues:

  • If length() % 2 != 0 then it must be Base64.
  • If length() % 3 == 1 then there are spurious 6 bits, which cannot be Base64. As it must be Base16, even length() % 2 == 0 must hold.
  • All letters are either uppercase or lowercase most likely.
  • The special "digits" / and + , and G-Zg-z are missing.

So:

boolean probablyHex(String s) {
    if (s.endsWith("=")) { // Base64 padding char (optional).
        return false;
    }
    s = s.replaceAll("[^-_+/A-Za-z0-9]", ""); // MIME safe Base64 variant too.
    if (s.matches(".*[-_+/G-Zg-z].*")) {
        return false;
    }
    int n = s.length();
    if (n % 2 == 1) {
       return false;
    }
    if (n % 3 == 1) { // Spurious char with 6 bits data.
       return true;
    }
    // Very unlikely that it is Base64, but you might have a bias towards Base64:
    if (!s.equals(s.toUpperCase(Locale.US)) && !s.equals(s.toLowerCase(Locale.US)) {
        // Mixed cases in A-Fa-f:
        // For small texts that is significantly incoherent, meaning Base64.
        return n > 32;
    }
    return true;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM