简体   繁体   English

如何将字节数组(MD5哈希)转换为字符串(36个字符)?

[英]How to convert a byte array (MD5 hash) into a string (36 chars)?

I've got a byte array that was created using a hash function. 我有一个使用哈希函数创建的字节数组。 I would like to convert this array into a string. 我想将此数组转换为字符串。 So far so good, it will give me hexadecimal string. 到目前为止这么好,它会给我十六进制字符串。

Now I would like to use something different than hexadecimal characters, I would like to encode the byte array with these 36 characters: [az][0-9] . 现在我想使用不同于十六进制字符的东西,我想用这36个字符编码字节数组 :[az] [0-9]

How would I go about? 我该怎么办?

Edit: the reason I would to do this, is because I would like to have a smaller string, than a hexadecimal string. 编辑:我之所以这样做,是因为我希望有一个比十六进制字符串更小的字符串。

I adapted my arbitrary-length base conversion function from this answer to C#: 我将这个任意长度的基本转换函数从这个答案改编为C#:

static string BaseConvert(string number, int fromBase, int toBase)
{
    var digits = "0123456789abcdefghijklmnopqrstuvwxyz";
    var length = number.Length;
    var result = string.Empty;

    var nibbles = number.Select(c => digits.IndexOf(c)).ToList();
    int newlen;
    do {
        var value = 0;
        newlen = 0;

        for (var i = 0; i < length; ++i) {
            value = value * fromBase + nibbles[i];
            if (value >= toBase) {
                if (newlen == nibbles.Count) {
                    nibbles.Add(0);
                }
                nibbles[newlen++] = value / toBase;
                value %= toBase;
            }
            else if (newlen > 0) {
                if (newlen == nibbles.Count) {
                    nibbles.Add(0);
                }
                nibbles[newlen++] = 0;
            }
        }
        length = newlen;
        result = digits[value] + result; //
    }
    while (newlen != 0);

    return result;
}

As it's coming from PHP it might not be too idiomatic C#, there are also no parameter validity checks. 由于它来自PHP,它可能不是太惯用的C#,也没有参数有效性检查。 However, you can feed it a hex-encoded string and it will work just fine with 但是,您可以为它提供一个十六进制编码的字符串,它可以正常工作

var result = BaseConvert(hexEncoded, 16, 36);

It's not exactly what you asked for, but encoding the byte[] into hex is trivial. 这不完全是你要求的,但将byte[]编码为hex是微不足道的。

See it in action . 看到它在行动

Earlier tonight I came across a codereview question revolving around the same algorithm being discussed here. 今晚早些时候,我遇到了一个代码转换问题,围绕着这里讨论的相同算法。 See: https://codereview.stackexchange.com/questions/14084/base-36-encoding-of-a-byte-array/ 请参阅: https//codereview.stackexchange.com/questions/14084/base-36-encoding-of-a-byte-array/

I provided a improved implementation of one of its earlier answers (both use BigInteger). 我提供了其早期答案之一的改进实现(都使用BigInteger)。 See: https://codereview.stackexchange.com/a/20014/20654 . 请参阅: https//codereview.stackexchange.com/a/20014/20654 The solution takes a byte[] and returns a Base36 string. 解决方案采用byte []并返回Base36字符串。 Both the original and mine include simple benchmark information. 原始和我的都包括简单的基准信息。

For completeness, the following is the method to decode a byte[] from an string. 为了完整起见,以下是从字符串解码byte []的方法。 I'll include the encode function from the link above as well. 我还将包含上面链接中的编码功能。 See the text after this code block for some simple benchmark info for decoding. 有关解码的一些简单基准信息,请参阅此代码块后面的文本。

const int kByteBitCount= 8; // number of bits in a byte
// constants that we use in FromBase36String and ToBase36String
const string kBase36Digits= "0123456789abcdefghijklmnopqrstuvwxyz";
static readonly double kBase36CharsLengthDivisor= Math.Log(kBase36Digits.Length, 2);
static readonly BigInteger kBigInt36= new BigInteger(36);

// assumes the input 'chars' is in big-endian ordering, MSB->LSB
static byte[] FromBase36String(string chars)
{
    var bi= new BigInteger();
    for (int x= 0; x < chars.Length; x++)
    {
        int i= kBase36Digits.IndexOf(chars[x]);
        if (i < 0) return null; // invalid character
        bi *= kBigInt36;
        bi += i;
    }

    return bi.ToByteArray();
}

// characters returned are in big-endian ordering, MSB->LSB
static string ToBase36String(byte[] bytes)
{
    // Estimate the result's length so we don't waste time realloc'ing
    int result_length= (int)
        Math.Ceiling(bytes.Length * kByteBitCount / kBase36CharsLengthDivisor);
    // We use a List so we don't have to CopyTo a StringBuilder's characters
    // to a char[], only to then Array.Reverse it later
    var result= new System.Collections.Generic.List<char>(result_length);

    var dividend= new BigInteger(bytes);
    // IsZero's computation is less complex than evaluating "dividend > 0"
    // which invokes BigInteger.CompareTo(BigInteger)
    while (!dividend.IsZero)
    {
        BigInteger remainder;
        dividend= BigInteger.DivRem(dividend, kBigInt36, out remainder);
        int digit_index= Math.Abs((int)remainder);
        result.Add(kBase36Digits[digit_index]);
    }

    // orientate the characters in big-endian ordering
    result.Reverse();
    // ToArray will also trim the excess chars used in length prediction
    return new string(result.ToArray());
}

"A test 1234. Made slightly larger!" “测试1234.做得稍大!” encodes to Base64 as "165kkoorqxin775ct82ist5ysteekll7kaqlcnnu6mfe7ag7e63b5" 编码为Base64为“165kkoorqxin775ct82ist5ysteekll7kaqlcnnu6mfe7ag7e63b5”

To decode that Base36 string 1,000,000 times takes 12.6558909 seconds on my machine (I used the same build and machine conditions as provided in my answer on codereview) 解码那个Base36字符串1,000,000次在我的机器上需要12.6558909秒(我使用了与我在codereview上的答案中提供的相同的构建和机器条件)

You mentioned that you were dealing with a byte[] for the MD5 hash, rather than a hexadecimal string representation of it, so I think this solution provide the least overhead for you. 你提到你正在处理MD5哈希的byte [],而不是它的十六进制字符串表示,所以我认为这个解决方案为你提供了最少的开销。

如果你想要一个更短的字符串并且可以接受[a-zA-Z0-9]和+和/然后看看Convert.ToBase64String

Using BigInteger (needs the System.Numerics reference) 使用BigInteger(需要System.Numerics参考)

Using BigInteger (needs the System.Numerics reference) 使用BigInteger(需要System.Numerics参考)

const string chars = "0123456789abcdefghijklmnopqrstuvwxyz";

// The result is padded with chars[0] to make the string length
// (int)Math.Ceiling(bytes.Length * 8 / Math.Log(chars.Length, 2))
// (so that for any value [0...0]-[255...255] of bytes the resulting
// string will have same length)
public static string ToBaseN(byte[] bytes, string chars, bool littleEndian = true, int len = -1)
{
    if (bytes.Length == 0 || len == 0)
    {
        return String.Empty;
    }

    // BigInteger saves in the last byte the sign. > 7F negative, 
    // <= 7F positive. 
    // If we have a "negative" number, we will prepend a 0 byte.
    byte[] bytes2;

    if (littleEndian)
    {
        if (bytes[bytes.Length - 1] <= 0x7F)
        {
            bytes2 = bytes;
        }
        else
        {
            // Note that Array.Resize doesn't modify the original array,
            // but creates a copy and sets the passed reference to the
            // new array
            bytes2 = bytes;
            Array.Resize(ref bytes2, bytes.Length + 1);
        }
    }
    else
    {
        bytes2 = new byte[bytes[0] > 0x7F ? bytes.Length + 1 : bytes.Length];

        // We copy and reverse the array
        for (int i = bytes.Length - 1, j = 0; i >= 0; i--, j++)
        {
            bytes2[j] = bytes[i];
        }
    }

    BigInteger bi = new BigInteger(bytes2);

    // A little optimization. We will do many divisions based on 
    // chars.Length .
    BigInteger length = chars.Length;

    // We pre-calc the length of the string. We know the bits of 
    // "information" of a byte are 8. Using Log2 we calc the bits of 
    // information of our new base. 
    if (len == -1)
    {
        len = (int)Math.Ceiling(bytes.Length * 8 / Math.Log(chars.Length, 2));
    }

    // We will build our string on a char[]
    var chs = new char[len];
    int chsIndex = 0;

    while (bi > 0)
    {
        BigInteger remainder;
        bi = BigInteger.DivRem(bi, length, out remainder);

        chs[littleEndian ? chsIndex : len - chsIndex - 1] = chars[(int)remainder];
        chsIndex++;

        if (chsIndex < 0)
        {
            if (bi > 0)
            {
                throw new OverflowException();
            }
        }
    }

    // We append the zeros that we skipped at the beginning
    if (littleEndian)
    {
        while (chsIndex < len)
        {
            chs[chsIndex] = chars[0];
            chsIndex++;
        }
    }
    else
    {
        while (chsIndex < len)
        {
            chs[len - chsIndex - 1] = chars[0];
            chsIndex++;
        }
    }

    return new string(chs);
}

public static byte[] FromBaseN(string str, string chars, bool littleEndian = true, int len = -1)
{
    if (str.Length == 0 || len == 0)
    {
        return new byte[0];
    }

    // This should be the maximum length of the byte[] array. It's 
    // the opposite of the one used in ToBaseN.
    // Note that it can be passed as a parameter
    if (len == -1)
    {
        len = (int)Math.Ceiling(str.Length * Math.Log(chars.Length, 2) / 8);
    }

    BigInteger bi = BigInteger.Zero;
    BigInteger length2 = chars.Length;
    BigInteger mult = BigInteger.One;

    for (int j = 0; j < str.Length; j++)
    {
        int ix = chars.IndexOf(littleEndian ? str[j] : str[str.Length - j - 1]);

        // We didn't find the character
        if (ix == -1)
        {
            throw new ArgumentOutOfRangeException();
        }

        bi += ix * mult;

        mult *= length2;
    }

    var bytes = bi.ToByteArray();

    int len2 = bytes.Length;

    // BigInteger adds a 0 byte for positive numbers that have the
    // last byte > 0x7F
    if (len2 >= 2 && bytes[len2 - 1] == 0)
    {
        len2--;
    }

    int len3 = Math.Min(len, len2);

    byte[] bytes2;

    if (littleEndian)
    {
        if (len == bytes.Length)
        {
            bytes2 = bytes;
        }
        else
        {
            bytes2 = new byte[len];
            Array.Copy(bytes, bytes2, len3);
        }
    }
    else
    {
        bytes2 = new byte[len];

        for (int i = 0; i < len3; i++)
        {
            bytes2[len - i - 1] = bytes[i];
        }
    }

    for (int i = len3; i < len2; i++)
    {
        if (bytes[i] != 0)
        {
            throw new OverflowException();
        }
    }

    return bytes2;
}

Be aware that they are REALLY slow! 请注意,它们真的很慢! REALLY REALLY slow! 真的很慢! (2 minutes for 100k). (10分钟2分钟)。 To speed them up you would probably need to rewrite the division/mod operation so that they work directly on a buffer, instead of each time recreating the scratch pads as it's done by BigInteger . 为了加快它们的速度,您可能需要重写division / mod操作,以便它们直接在缓冲区上工作,而不是每次都重新创建由BigInteger完成的便笺BigInteger And it would still be SLOW. 它仍然会很慢。 The problem is that the time needed to encode the first byte is O(n) where n is the length of the byte array (this because all the array needs to be divided by 36). 问题是编码第一个字节所需的时间是O(n),其中n是字节数组的长度(这是因为所有数组都需要除以36)。 Unless you want to work with blocks of 5 bytes and lose some bits. 除非您想使用5个字节的块并丢失一些位。 Each symbol of Base36 carries around 5.169925001 bits. Base36的每个符号带有大约5.169925001位。 So 8 of these symbols would carry 41.35940001 bits. 因此,这些符号中的8个将携带41.35940001位。 Very near 40 bytes. 非常接近40个字节。

Note that these methods can work both in little-endian mode and in big-endian mode. 请注意,这些方法可以在little-endian模式和big-endian模式下工作。 The endianness of the input and of the output is the same. 输入和输出的字节顺序是相同的。 Both methods accept a len parameter. 两种方法都接受len参数。 You can use it to trim excess 0 (zeroes). 您可以使用它来修剪多余的0 (零)。 Note that if you try to make an output too much small to contain the input, an OverflowException will be thrown. 请注意,如果您尝试使输出太小而无法包含输入,则会抛出OverflowException

System.Text.Encoding enc = System.Text.Encoding.ASCII;
string myString = enc.GetString(myByteArray);

You can play with what encoding you need: 您可以使用您需要的编码:

System.Text.ASCIIEncoding,
System.Text.UnicodeEncoding,
System.Text.UTF7Encoding,
System.Text.UTF8Encoding

To match the requrements [az][0-9] you can use it: 要匹配请求[az][0-9]您可以使用它:

Byte[] bytes = new Byte[] { 200, 180, 34 };
string result = String.Join("a", bytes.Select(x => x.ToString()).ToArray());

You will have string representation of bytes with char separator. 您将使用char分隔符来字符串表示字节。 To convert back you will need to split, and convert the string[] to byte[] using the same approach with .Select() . 要转换回来,您需要拆分,并使用与.Select()相同的方法将string[]转换为byte[]

Usually a power of 2 is used - that way one character maps to a fixed number of bits. 通常使用2的幂 - 这样一个字符映射到固定数量的位。 An alphabet of 32 bits for instance would map to 5 bits. 例如,32位字母表将映射到5位。 The only challenge in that case is how to deserialize variable-length strings. 在这种情况下唯一的挑战是如何反序列化可变长度字符串。

For 36 bits you could treat the data as a large number, and then: 对于36位,您可以将数据视为一个大数字,然后:

  • divide by 36 除以36
  • add the remainder as character to your result 将余数添加为结果的字符
  • repeat until the division results in 0 重复直到除法结果为0

Easier said than done perhaps. 或许说起来容易做起来难。

you can use modulu. 你可以使用modulu。 this example encode your byte array to string of [0-9][az]. 此示例将您的字节数组编码为[0-9] [az]的字符串。 change it if you want. 如果你想改变它。

    public string byteToString(byte[] byteArr)
    {
        int i;
        char[] charArr = new char[byteArr.Length];
        for (i = 0; i < byteArr.Length; i++)
        {
            int byt = byteArr[i] % 36; // 36=num of availible charachters
            if (byt < 10)
            {
                charArr[i] = (char)(byt + 48); //if % result is a digit
            }
            else
            {
                charArr[i] = (char)(byt + 87); //if % result is a letter
            }
        }
        return new String(charArr);
    }

If you don't want to lose data for de-encoding you can use this example: 如果您不想丢失用于解码的数据,可以使用以下示例:

    public string byteToString(byte[] byteArr)
    {
        int i;
        char[] charArr = new char[byteArr.Length*2];
        for (i = 0; i < byteArr.Length; i++)
        {
            charArr[2 * i] = (char)((int)byteArr[i] / 36+48);
            int byt = byteArr[i] % 36; // 36=num of availible charachters
            if (byt < 10)
            {
                charArr[2*i+1] = (char)(byt + 48); //if % result is a digit
            }
            else
            {
                charArr[2*i+1] = (char)(byt + 87); //if % result is a letter
            }
        }
        return new String(charArr);
    }

and now you have a string double-lengthed when odd char is the multiply of 36 and even char is the residu. 现在你有一个双字符串,当奇数char是36的乘法,偶数char是残差。 for example: 200=36*5+20 => "5k". 例如:200 = 36 * 5 + 20 =>“5k”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM