整數的變長編碼

Question

在 C# 中對無符號整數值進行可變長度編碼的最佳方法是什么？

“實際意圖是將可變長度編碼的整數（字節）附加到文件頭。”

例如：“內容長度” - Http 標頭

這可以通過對下面的邏輯進行一些更改來實現。

我已經寫了一些代碼來做到這一點......

Answer 1

我使用的一種方法是對 7 位數據 + 1 位開銷 pr 進行編碼，該方法使較小的值使用較少的字節。 字節。

編碼僅適用於從零開始的正值，但也可以根據需要進行修改以處理負值。

編碼的工作方式是這樣的：

獲取您的值的最低 7 位並將它們存儲在一個字節中，這就是您要輸出的內容
將值向右移動 7 位，去掉剛剛抓取的 7 位
如果該值非零（即從它移開 7 位之后），請在輸出之前設置要輸出的字節的高位
輸出字節
如果該值非零（即導致設置高位的相同檢查），則返回並從頭開始重復步驟

解碼：

從位位置 0 開始
從文件中讀取一個字節
存儲是否設置了高位，並屏蔽掉
或在字節的其余部分轉換為您的最終值，在您所在的位位置
如果設置了高位，則將位位置增加 7，並重復步驟，跳過第一個（不要重置位位置）

39    32 31    24 23    16 15     8 7      0
value:            |DDDDDDDD|CCCCCCCC|BBBBBBBB|AAAAAAAA|
encoded: |0000DDDD|xDDDDCCC|xCCCCCBB|xBBBBBBA|xAAAAAAA| (note, stored in reverse order)

正如您所看到的，由於控制位的開銷，編碼值可能會占用一個額外的字節，而這個字節只是使用了一半。 如果將其擴展為 64 位值，則額外的字節將被完全使用，因此仍然只有一個字節的額外開銷。

注意：由於編碼一次存儲一個字節的值，總是以相同的順序，大端或小端系統不會改變它的布局。 最低有效字節總是首先存儲，等等。

范圍及其編碼大小：

0 -         127 : 1 byte
        128 -      16.383 : 2 bytes
     16.384 -   2.097.151 : 3 bytes
  2.097.152 - 268.435.455 : 4 bytes
268.435.456 -   max-int32 : 5 bytes

這是兩者的 C# 實現：

void Main()
{
    using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Create))
    using (BinaryWriter writer = new BinaryWriter(stream))
        writer.EncodeInt32(123456789);

    using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Open))
    using (BinaryReader reader = new BinaryReader(stream))
        reader.DecodeInt32().Dump();
}

// Define other methods and classes here

public static class Extensions
{
    /// <summary>
    /// Encodes the specified <see cref="Int32"/> value with a variable number of
    /// bytes, and writes the encoded bytes to the specified writer.
    /// </summary>
    /// <param name="writer">
    /// The <see cref="BinaryWriter"/> to write the encoded value to.
    /// </param>
    /// <param name="value">
    /// The <see cref="Int32"/> value to encode and write to the <paramref name="writer"/>.
    /// </param>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="writer"/> is <c>null</c>.</para>
    /// </exception>
    /// <exception cref="ArgumentOutOfRangeException">
    /// <para><paramref name="value"/> is less than 0.</para>
    /// </exception>
    /// <remarks>
    /// See <see cref="DecodeInt32"/> for how to decode the value back from
    /// a <see cref="BinaryReader"/>.
    /// </remarks>
    public static void EncodeInt32(this BinaryWriter writer, int value)
    {
        if (writer == null)
            throw new ArgumentNullException("writer");
        if (value < 0)
            throw new ArgumentOutOfRangeException("value", value, "value must be 0 or greater");

        do
        {
            byte lower7bits = (byte)(value & 0x7f);
            value >>= 7;
            if (value > 0)
                lower7bits |= 128;
            writer.Write(lower7bits);
        } while (value > 0);
    }

    /// <summary>
    /// Decodes a <see cref="Int32"/> value from a variable number of
    /// bytes, originally encoded with <see cref="EncodeInt32"/> from the specified reader.
    /// </summary>
    /// <param name="reader">
    /// The <see cref="BinaryReader"/> to read the encoded value from.
    /// </param>
    /// <returns>
    /// The decoded <see cref="Int32"/> value.
    /// </returns>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="reader"/> is <c>null</c>.</para>
    /// </exception>
    public static int DecodeInt32(this BinaryReader reader)
    {
        if (reader == null)
            throw new ArgumentNullException("reader");

        bool more = true;
        int value = 0;
        int shift = 0;
        while (more)
        {
            byte lower7bits = reader.ReadByte();
            more = (lower7bits & 128) != 0;
            value |= (lower7bits & 0x7f) << shift;
            shift += 7;
        }
        return value;
    }
}

Answer 2

您應該首先制作您的價值的直方圖。 如果分布是隨機的（即，直方圖計數的每個 bin 都接近另一個），那么您將無法比此數字的二進制表示更有效地進行編碼。

如果您的直方圖是不平衡的（即，如果某些值比其他值更多），那么選擇一種對這些值使用較少位的編碼，而對其他不太可能的值使用更多位可能是有意義的。

例如，如果您需要編碼的數字小於 15 位的可能性是大於 15 位的 2 倍，您可以使用第 16 位來告訴這一點，並且只存儲/發送 16 位（如果它為零，那么即將到來的字節將形成一個可以放入 32 位數字的 16 位數字）。 如果它是 1，那么接下來的 25 位將形成一個 32 位的數字。 你在這里輸了一點，但因為最后不太可能，對於很多數字，你贏得更多的位。

顯然，這是一個微不足道的案例，將其擴展到 2 個以上的案例是 Huffman 算法，該算法根據數字出現的概率影響接近最優的“代碼字”。

還有算術編碼算法也可以做到這一點（可能還有其他）。

在所有情況下，沒有比當前在計算機內存中所做的更有效地存儲隨機值的解決方案。

您必須考慮與最終節省的費用相比，實施此類解決方案需要多長時間和多難，才能知道這樣做是否值得。 語言本身在這里不相關。

Answer 3

如果小值比大值更常見，您可以使用Golomb 編碼。

Answer 4

我知道這個問題是幾年前被問到的，但是對於 MIDI 開發人員，我想從我正在處理的個人 MIDI 項目中分享一些代碼。 代碼塊基於 Paul Messick 所著的《Maximum MIDI》一書中的一段（這個例子是根據我自己的需要調整的版本，但是，這個概念就在那里......）。

    public struct VariableLength
    {
        // Variable Length byte array to int
        public VariableLength(byte[] bytes)
        {
            int index = 0;
            int value = 0;
            byte b;
            do
            {
                value = (value << 7) | ((b = bytes[index]) & 0x7F);
                index++;
            } while ((b & 0x80) != 0);

            Length = index;
            Value = value;
            Bytes = new byte[Length];
            Array.Copy(bytes, 0, Bytes, 0, Length);
        }

        // Variable Length int to byte array
        public VariableLength(int value)
        {
            Value = value;
            byte[] bytes = new byte[4];
            int index = 0;
            int buffer = value & 0x7F;

            while ((value >>= 7) > 0)
            {
                buffer <<= 8;
                buffer |= 0x80;
                buffer += (value & 0x7F);
            }
            while (true)
            {
                bytes[index] = (byte)buffer;
                index++;
                if ((buffer & 0x80) > 0)
                    buffer >>= 8;
                else
                    break;
            }

            Length = index;
            Bytes = new byte[index];
            Array.Copy(bytes, 0, Bytes, 0, Length);
        }

        // Number of bytes used to store the variable length value
        public int Length { get; private set; }
        // Variable Length Value
        public int Value { get; private set; }
        // Bytes representing the integer value
        public byte[] Bytes { get; private set; }
    }

如何使用：

public void Example()
{   
//Convert an integer into a variable length byte
int varLenVal = 480;     
VariableLength v = new VariableLength(varLenVal);
byte[] bytes = v.Bytes;

//Convert a variable length byte array into an integer
byte[] varLenByte = new byte[2]{131, 96};     
VariableLength v = new VariableLength(varLenByte);
int result = v.Length;
}

Answer 5

BinaryReader.Read7BitEncodedInt 方法？

BinaryWriter.Write7BitEncodedInt 方法？

Answer 6

正如Grimbly 指出的那樣，存在BinaryReader.Read7BitEncodedInt和BinaryWriter.Write7BitEncodedInt 。 但是，這些是不能從 BinaryReader 或 -Writer 對象調用的內部方法。

但是，您可以做的是獲取內部實現並從reader和writer復制它：

public static int Read7BitEncodedInt(this BinaryReader br) {
    // Read out an Int32 7 bits at a time.  The high bit 
    // of the byte when on means to continue reading more bytes.
    int count = 0;
    int shift = 0;
    byte b;
    do {
        // Check for a corrupted stream.  Read a max of 5 bytes.
        // In a future version, add a DataFormatException.
        if (shift == 5 * 7)  // 5 bytes max per Int32, shift += 7
            throw new FormatException("Format_Bad7BitInt32");

        // ReadByte handles end of stream cases for us. 
        b = br.ReadByte();
        count |= (b & 0x7F) << shift;
        shift += 7;
    } while ((b & 0x80) != 0); 
    return count;
}   

public static void Write7BitEncodedInt(this BinaryWriter br, int value) {
    // Write out an int 7 bits at a time.  The high bit of the byte,
    // when on, tells reader to continue reading more bytes.
    uint v = (uint)value;   // support negative numbers
    while (v >= 0x80) {
        br.Write((byte)(v | 0x80));
        v >>= 7;
    }
    br.Write((byte)v);
}

當您在項目的任何類中包含此代碼時，您將能夠在任何BinaryReader / BinaryWriter對象上使用這些方法。 它們只是稍作修改，使它們在原始類之外工作（例如，通過將ReadByte()更改為br.ReadByte() ）。 評論來自原文。

整數的變長編碼

問題描述

6 個解決方案

解決方案1
16 2010-08-25 09:58:27

解決方案2
1 2015-09-11 08:16:46

解決方案3
1 2010-08-25 06:16:19

解決方案4
1 2019-07-20 23:51:30

解決方案5
0 2015-04-30 01:08:02

解決方案6
0 2019-12-22 17:04:15

整數的變長編碼

問題描述

6 個解決方案

解決方案1 16 2010-08-25 09:58:27

解決方案2 1 2015-09-11 08:16:46

解決方案3 1 2010-08-25 06:16:19

解決方案4 1 2019-07-20 23:51:30

解決方案5 0 2015-04-30 01:08:02

解決方案6 0 2019-12-22 17:04:15

解決方案1
16 2010-08-25 09:58:27

解決方案2
1 2015-09-11 08:16:46

解決方案3
1 2010-08-25 06:16:19

解決方案4
1 2019-07-20 23:51:30

解決方案5
0 2015-04-30 01:08:02

解決方案6
0 2019-12-22 17:04:15