简体   繁体   中英

How can I add header to compressed string with gzip in python?

I'm trying to compress string by python like a specific C# code but I'm getting a different result. It seems I have to add a header to the compressed result but I don't know how can I add a header to a compressed string in python. This is the C# line which I don't know how would be in python:

memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

This is the whole runable C# code

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace Rextester
{
    /// <summary>Handles compressing and decompressing API requests and responses.</summary>
    public class Compression
    {
        #region Member Variables
        /// <summary>The compressed message header length.</summary>
        private const int CompressedMessageHeaderLength = 4;
        #endregion

        #region Methods
        /// <summary>Compresses the XML string.</summary>
        /// <param name="documentToCompress">The XML string to compress.</param>
        public static string CompressData(string data)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                byte[] plainBytes = Encoding.UTF8.GetBytes(data);

                using (GZipStream zipStream = new GZipStream(memoryStream, CompressionMode.Compress, leaveOpen: true))
                {
                    zipStream.Write(plainBytes, 0, plainBytes.Length);
                }

                memoryStream.Position = 0;

                byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];

                Buffer.BlockCopy(
                    BitConverter.GetBytes(plainBytes.Length),
                    0,
                    compressedBytes,
                    0,
                    CompressedMessageHeaderLength
                );

                // Add the header, which is the length of the compressed message.
                memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

                string compressedXml = Convert.ToBase64String(compressedBytes);

                return compressedXml;
            }
        }
        
 
        #endregion
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            //Your code goes here
            string data = "Hello World!";
            Console.WriteLine(  Compression.CompressData(data) );
            // result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA

        }
    }
}

and this is the Python code I wrote:

data = 'Hello World!'

import gzip
import base64
print(base64.b64encode(gzip.compress(data.encode('utf-8'))))

# I expect DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA 
# but I get H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=

You can use to_bytes to convert length of encoded string:

enc = data.encode('utf-8')
zipped = gzip.compress(enc)
print(base64.b64encode((len(enc)).to_bytes(4, sys.byteorder) + zipped)) # sys.byteorder can be set to concrete fixed value

Also it seems that gzip.compress(enc) produces slightly different result than C# counterpart (so the overall result will also differ) but this should not be an issue so decompress should handle everything correctly.

One thing I'll start with is that the C# code is not well-suited for cross platform use. The byte order of the length header is dependent on the underlying architecture, as BitConverter.GetBytes returns bytes in whatever order the architecture is.

But, for C#, we probably mean windows, which means probably Intel, so Little Endian is very likely.

So, what you need to do is prepend the length of the original data to the compressed data, in Little Endian order. 4 bytes exactly.

bdata = data.encode('utf-8')
compressed = gzip.compress(bdata)
header = len(bdata).to_bytes(4,'little')

Then, you need to concatenate and convert to base64:

print(base64.b64encode(header + compressed))

As mentioned by someone else, the fact that you put that header in in the c# version is a difference.

As well, note that the gzip process can be done in various ways. In C# for example, you can specify a CompressionLevel of Optimal , Fastest , or NoCompression . See: https://docs.microsoft.com/en-us/dotnet/api/system.io.compression.compressionlevel?view=net-5.0

I'm not familiar enough with Python to say how it will handle gzip compression by default (maybe Fastest in C# provides a more or less aggressive algorithm than Python)

Here is your C# code, with the header value set to '0', and outputting with the 3 CompressionLevels . Note that it outputs a string value that is 'pretty close' to what you are getting in Python.

You should also ask if it really matters that the values are different. SO long as you can encode and decode, is that enough?

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

public class Program
{
    public static void Main()
    {
        string data = "Hello World!";
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.Fastest) );
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.NoCompression) );
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.Optimal) );
        // result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA
        // but I get       H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=
    }
}

public class Compression
    {
        #region Member Variables
        /// <summary>The compressed message header length.</summary>
        private const int CompressedMessageHeaderLength = 0; // changed to zero
        #endregion

        #region Methods
        /// <summary>Compresses the XML string.</summary>
        /// <param name="documentToCompress">The XML string to compress.</param>
        public static string CompressData(string data, CompressionLevel compressionLevel)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                byte[] plainBytes = Encoding.UTF8.GetBytes(data);

                using (GZipStream zipStream = new GZipStream(memoryStream, compressionLevel, leaveOpen: true))
                {
                    zipStream.Write(plainBytes, 0, plainBytes.Length);
                }

                memoryStream.Position = 0;

                byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];

                Buffer.BlockCopy(
                    BitConverter.GetBytes(plainBytes.Length),
                    0,
                    compressedBytes,
                    0,
                    CompressedMessageHeaderLength
                );

                // Add the header, which is the length of the compressed message.
                memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

                string compressedXml = Convert.ToBase64String(compressedBytes);

                return compressedXml;
            }
        }
        
 
        #endregion
    }

Output:

H4sIAAAAAAAEA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA= H4sIAAAAAAAEAwEMAPP/SGVsbG8gV29ybGQhoxwpHAwAAAA= H4sIAAAAAAAAA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=

And at: https://dotnetfiddle.net/TI8gwM

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM