简体   繁体   English

如何在 python 中使用 gzip 将 header 添加到压缩字符串中?

[英]How can I add header to compressed string with gzip in python?

I'm trying to compress string by python like a specific C# code but I'm getting a different result.我正在尝试通过 python 压缩字符串,就像特定的 C# 代码一样,但我得到了不同的结果。 It seems I have to add a header to the compressed result but I don't know how can I add a header to a compressed string in python.似乎我必须将 header 添加到压缩结果中,但我不知道如何将 header 添加到 python 中的压缩字符串中。 This is the C# line which I don't know how would be in python:这是 C# 行,我不知道在 python 中会怎样:

memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

This is the whole runable C# code这是整个可运行的 C# 代码

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace Rextester
{
    /// <summary>Handles compressing and decompressing API requests and responses.</summary>
    public class Compression
    {
        #region Member Variables
        /// <summary>The compressed message header length.</summary>
        private const int CompressedMessageHeaderLength = 4;
        #endregion

        #region Methods
        /// <summary>Compresses the XML string.</summary>
        /// <param name="documentToCompress">The XML string to compress.</param>
        public static string CompressData(string data)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                byte[] plainBytes = Encoding.UTF8.GetBytes(data);

                using (GZipStream zipStream = new GZipStream(memoryStream, CompressionMode.Compress, leaveOpen: true))
                {
                    zipStream.Write(plainBytes, 0, plainBytes.Length);
                }

                memoryStream.Position = 0;

                byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];

                Buffer.BlockCopy(
                    BitConverter.GetBytes(plainBytes.Length),
                    0,
                    compressedBytes,
                    0,
                    CompressedMessageHeaderLength
                );

                // Add the header, which is the length of the compressed message.
                memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

                string compressedXml = Convert.ToBase64String(compressedBytes);

                return compressedXml;
            }
        }
        
 
        #endregion
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            //Your code goes here
            string data = "Hello World!";
            Console.WriteLine(  Compression.CompressData(data) );
            // result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA

        }
    }
}

and this is the Python code I wrote:这是我写的 Python 代码:

data = 'Hello World!'

import gzip
import base64
print(base64.b64encode(gzip.compress(data.encode('utf-8'))))

# I expect DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA 
# but I get H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=

You can use to_bytes to convert length of encoded string:您可以使用to_bytes转换编码字符串的长度:

enc = data.encode('utf-8')
zipped = gzip.compress(enc)
print(base64.b64encode((len(enc)).to_bytes(4, sys.byteorder) + zipped)) # sys.byteorder can be set to concrete fixed value

Also it seems that gzip.compress(enc) produces slightly different result than C# counterpart (so the overall result will also differ) but this should not be an issue so decompress should handle everything correctly.此外,似乎gzip.compress(enc)产生的结果与 C# 对应的结果略有不同(因此总体结果也会有所不同)但这不应该是问题,因此解压缩应该正确处理所有内容。

One thing I'll start with is that the C# code is not well-suited for cross platform use.我要开始的一件事是 C# 代码不适合跨平台使用。 The byte order of the length header is dependent on the underlying architecture, as BitConverter.GetBytes returns bytes in whatever order the architecture is.长度 header 的字节顺序取决于底层架构,因为BitConverter.GetBytes以架构的任何顺序返回字节。

But, for C#, we probably mean windows, which means probably Intel, so Little Endian is very likely.但是,对于C#,我们可能指的是windows,这意味着可能是英特尔,所以小端很可能。

So, what you need to do is prepend the length of the original data to the compressed data, in Little Endian order.因此,您需要做的是按照 Little Endian 顺序将原始数据的长度添加到压缩数据中。 4 bytes exactly.正好 4 个字节。

bdata = data.encode('utf-8')
compressed = gzip.compress(bdata)
header = len(bdata).to_bytes(4,'little')

Then, you need to concatenate and convert to base64:然后,您需要连接并转换为 base64:

print(base64.b64encode(header + compressed))

As mentioned by someone else, the fact that you put that header in in the c# version is a difference.正如其他人所提到的,您将 header 放入 c# 版本中的事实是不同的。

As well, note that the gzip process can be done in various ways.另外,请注意 gzip 过程可以通过多种方式完成。 In C# for example, you can specify a CompressionLevel of Optimal , Fastest , or NoCompression .例如,在 C# 中,您可以指定OptimalFastestNoCompressionCompressionLevel See: https://docs.microsoft.com/en-us/dotnet/api/system.io.compression.compressionlevel?view=net-5.0请参阅: https://docs.microsoft.com/en-us/dotnet/api/system.io.compression.compressionlevel?view=net-5.0

I'm not familiar enough with Python to say how it will handle gzip compression by default (maybe Fastest in C# provides a more or less aggressive algorithm than Python)我对 Python 不太熟悉,说它默认如何处理 gzip 压缩(也许 C# 中Fastest的算法比 Python 提供或多或少的激进算法)

Here is your C# code, with the header value set to '0', and outputting with the 3 CompressionLevels .这是您的 C# 代码,其中 header 值设置为“0”,并使用 3 CompressionLevels输出。 Note that it outputs a string value that is 'pretty close' to what you are getting in Python.请注意,它输出的字符串值与您在 Python 中得到的值“非常接近”。

You should also ask if it really matters that the values are different.您还应该询问价值观不同是否真的很重要。 SO long as you can encode and decode, is that enough?只要你能编码和解码,就够了吗?

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

public class Program
{
    public static void Main()
    {
        string data = "Hello World!";
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.Fastest) );
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.NoCompression) );
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.Optimal) );
        // result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA
        // but I get       H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=
    }
}

public class Compression
    {
        #region Member Variables
        /// <summary>The compressed message header length.</summary>
        private const int CompressedMessageHeaderLength = 0; // changed to zero
        #endregion

        #region Methods
        /// <summary>Compresses the XML string.</summary>
        /// <param name="documentToCompress">The XML string to compress.</param>
        public static string CompressData(string data, CompressionLevel compressionLevel)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                byte[] plainBytes = Encoding.UTF8.GetBytes(data);

                using (GZipStream zipStream = new GZipStream(memoryStream, compressionLevel, leaveOpen: true))
                {
                    zipStream.Write(plainBytes, 0, plainBytes.Length);
                }

                memoryStream.Position = 0;

                byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];

                Buffer.BlockCopy(
                    BitConverter.GetBytes(plainBytes.Length),
                    0,
                    compressedBytes,
                    0,
                    CompressedMessageHeaderLength
                );

                // Add the header, which is the length of the compressed message.
                memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

                string compressedXml = Convert.ToBase64String(compressedBytes);

                return compressedXml;
            }
        }
        
 
        #endregion
    }

Output: Output:

H4sIAAAAAAAEA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA= H4sIAAAAAAAEAwEMAPP/SGVsbG8gV29ybGQhoxwpHAwAAAA= H4sIAAAAAAAAA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA= H4sIAAAAAAAEA/NIzcnJVwjPL8pJUQQAoxwpHAwAwAAAA= H4sIAAAAAAAEAwEMAPP/SGVsbG8gV29ybGQhoxwpHAwAAAA= H4sIAAAAAAAAAA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=

And at: https://dotnetfiddle.net/TI8gwM在: https://dotnetfiddle.net/TI8gwM

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM