简体   繁体   English

压缩/解压字符串 C#

[英]Compression/Decompression string with C#

I am newbie in .net. I am doing compression and decompression string in C#. There is a XML and I am converting in string and after that I am doing compression and decompression.There is no compilation error in my code except when I decompression my code and return my string, its returning only half of the XML.我是 .net 的新手。我在 C# 做压缩和解压字符串。有一个 XML,我正在转换为字符串,然后我在做压缩和解压。我的代码没有编译错误,除非我解压我的代码并返回我的字符串,它只返回 XML 的一半。

Below is my code, please correct me where I am wrong.下面是我的代码,不对的地方请指正。

Code:代码:

class Program
{
    public static string Zip(string value)
    {
        //Transform string into byte[]  
        byte[] byteArray = new byte[value.Length];
        int indexBA = 0;
        foreach (char item in value.ToCharArray())
        {
            byteArray[indexBA++] = (byte)item;
        }

        //Prepare for compress
        System.IO.MemoryStream ms = new System.IO.MemoryStream();
        System.IO.Compression.GZipStream sw = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Compress);

        //Compress
        sw.Write(byteArray, 0, byteArray.Length);
        //Close, DO NOT FLUSH cause bytes will go missing...
        sw.Close();

        //Transform byte[] zip data to string
        byteArray = ms.ToArray();
        System.Text.StringBuilder sB = new System.Text.StringBuilder(byteArray.Length);
        foreach (byte item in byteArray)
        {
            sB.Append((char)item);
        }
        ms.Close();
        sw.Dispose();
        ms.Dispose();
        return sB.ToString();
    }

    public static string UnZip(string value)
    {
        //Transform string into byte[]
        byte[] byteArray = new byte[value.Length];
        int indexBA = 0;
        foreach (char item in value.ToCharArray())
        {
            byteArray[indexBA++] = (byte)item;
        }

        //Prepare for decompress
        System.IO.MemoryStream ms = new System.IO.MemoryStream(byteArray);
        System.IO.Compression.GZipStream sr = new System.IO.Compression.GZipStream(ms,
            System.IO.Compression.CompressionMode.Decompress);

        //Reset variable to collect uncompressed result
        byteArray = new byte[byteArray.Length];

        //Decompress
        int rByte = sr.Read(byteArray, 0, byteArray.Length);

        //Transform byte[] unzip data to string
        System.Text.StringBuilder sB = new System.Text.StringBuilder(rByte);
        //Read the number of bytes GZipStream red and do not a for each bytes in
        //resultByteArray;
        for (int i = 0; i < rByte; i++)
        {
            sB.Append((char)byteArray[i]);
        }
        sr.Close();
        ms.Close();
        sr.Dispose();
        ms.Dispose();
        return sB.ToString();
    }

    static void Main(string[] args)
    {
        XDocument doc = XDocument.Load(@"D:\RSP.xml");
        string val = doc.ToString(SaveOptions.DisableFormatting);
        val = Zip(val);
        val = UnZip(val);
    }
} 

My XML size is 63KB.我的 XML 大小是 63KB。

The code to compress/decompress a string压缩/解压缩字符串的代码

public static void CopyTo(Stream src, Stream dest) {
    byte[] bytes = new byte[4096];

    int cnt;

    while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0) {
        dest.Write(bytes, 0, cnt);
    }
}

public static byte[] Zip(string str) {
    var bytes = Encoding.UTF8.GetBytes(str);

    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
            //msi.CopyTo(gs);
            CopyTo(msi, gs);
        }

        return mso.ToArray();
    }
}

public static string Unzip(byte[] bytes) {
    using (var msi = new MemoryStream(bytes))
    using (var mso = new MemoryStream()) {
        using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
            //gs.CopyTo(mso);
            CopyTo(gs, mso);
        }

        return Encoding.UTF8.GetString(mso.ToArray());
    }
}

static void Main(string[] args) {
    byte[] r1 = Zip("StringStringStringStringStringStringStringStringStringStringStringStringStringString");
    string r2 = Unzip(r1);
}

Remember that Zip returns a byte[] , while Unzip returns a string .请记住, Zip返回一个byte[] ,而Unzip返回一个string If you want a string from Zip you can Base64 encode it (for example by using Convert.ToBase64String(r1) ) (the result of Zip is VERY binary! It isn't something you can print to the screen or write directly in an XML)如果你想要一个来自Zip的字符串,你可以对它进行 Base64 编码(例如使用Convert.ToBase64String(r1) )( Zip的结果是非常二进制的!它不是你可以打印到屏幕或直接写入 XML 的东西)

The version suggested is for .NET 2.0, for .NET 4.0 use the MemoryStream.CopyTo .建议的版本适用于 .NET 2.0,对于 .NET 4.0 使用MemoryStream.CopyTo

IMPORTANT: The compressed contents cannot be written to the output stream until the GZipStream knows that it has all of the input (ie, to effectively compress it needs all of the data).重要提示:GZipStream知道它拥有所有输入(即,有效压缩它需要所有数据)之前,无法将压缩内容写入输出流。 You need to make sure that you Dispose() of the GZipStream before inspecting the output stream (eg, mso.ToArray() ).在检查输出流(例如, mso.ToArray() )之前,您需要确保GZipStream Dispose() ) 。 This is done with the using() { } block above.这是通过上面的using() { }块完成的。 Note that the GZipStream is the innermost block and the contents are accessed outside of it.请注意, GZipStream是最内层的块,内容是在其外部访问的。 The same goes for decompressing: Dispose() of the GZipStream before attempting to access the data.在尝试访问数据之前解压缩: GZipStream Dispose()也是GZipStream

according to this snippet i use this code and it's working fine:根据这个片段,我使用了这段代码,它工作正常:

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace CompressString
{
    internal static class StringCompressor
    {
        /// <summary>
        /// Compresses the string.
        /// </summary>
        /// <param name="text">The text.</param>
        /// <returns></returns>
        public static string CompressString(string text)
        {
            byte[] buffer = Encoding.UTF8.GetBytes(text);
            var memoryStream = new MemoryStream();
            using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
            {
                gZipStream.Write(buffer, 0, buffer.Length);
            }

            memoryStream.Position = 0;

            var compressedData = new byte[memoryStream.Length];
            memoryStream.Read(compressedData, 0, compressedData.Length);

            var gZipBuffer = new byte[compressedData.Length + 4];
            Buffer.BlockCopy(compressedData, 0, gZipBuffer, 4, compressedData.Length);
            Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gZipBuffer, 0, 4);
            return Convert.ToBase64String(gZipBuffer);
        }

        /// <summary>
        /// Decompresses the string.
        /// </summary>
        /// <param name="compressedText">The compressed text.</param>
        /// <returns></returns>
        public static string DecompressString(string compressedText)
        {
            byte[] gZipBuffer = Convert.FromBase64String(compressedText);
            using (var memoryStream = new MemoryStream())
            {
                int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
                memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);

                var buffer = new byte[dataLength];

                memoryStream.Position = 0;
                using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
                {
                    gZipStream.Read(buffer, 0, buffer.Length);
                }

                return Encoding.UTF8.GetString(buffer);
            }
        }
    }
}

With the advent of .NET 4.0 (and higher) with the Stream.CopyTo() methods, I thought I would post an updated approach.随着 .NET 4.0(及更高版本)和 Stream.CopyTo() 方法的出现,我想我会发布一个更新的方法。

I also think the below version is useful as a clear example of a self-contained class for compressing regular strings to Base64 encoded strings, and vice versa:我还认为下面的版本作为一个自包含类的清晰示例非常有用,用于将常规字符串压缩为 Base64 编码的字符串,反之亦然:

public static class StringCompression
{
    /// <summary>
    /// Compresses a string and returns a deflate compressed, Base64 encoded string.
    /// </summary>
    /// <param name="uncompressedString">String to compress</param>
    public static string Compress(string uncompressedString)
    {
        byte[] compressedBytes;

        using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
        {
            using (var compressedStream = new MemoryStream())
            { 
                // setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
                // this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
                // although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
                using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
                {
                    uncompressedStream.CopyTo(compressorStream);
                }

                // call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
                compressedBytes = compressedStream.ToArray();
            }
        }

        return Convert.ToBase64String(compressedBytes);
    }

    /// <summary>
    /// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
    /// </summary>
    /// <param name="compressedString">String to decompress.</param>
    public static string Decompress(string compressedString)
    {
        byte[] decompressedBytes;

        var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));

        using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
        {
            using (var decompressedStream = new MemoryStream())
            {
                decompressorStream.CopyTo(decompressedStream);

                decompressedBytes = decompressedStream.ToArray();
            }
        }

        return Encoding.UTF8.GetString(decompressedBytes);
    }

Here's another approach using the extension methods technique to extend the String class to add string compression and decompression.这是使用扩展方法技术扩展 String 类以添加字符串压缩和解压缩的另一种方法。 You can drop the class below into an existing project and then use thusly:您可以将下面的类放入现有项目中,然后这样使用:

var uncompressedString = "Hello World!";
var compressedString = uncompressedString.Compress();

and

var decompressedString = compressedString.Decompress();

To wit:以机智:

public static class Extensions
{
    /// <summary>
    /// Compresses a string and returns a deflate compressed, Base64 encoded string.
    /// </summary>
    /// <param name="uncompressedString">String to compress</param>
    public static string Compress(this string uncompressedString)
    {
        byte[] compressedBytes;

        using (var uncompressedStream = new MemoryStream(Encoding.UTF8.GetBytes(uncompressedString)))
        {
            using (var compressedStream = new MemoryStream())
            { 
                // setting the leaveOpen parameter to true to ensure that compressedStream will not be closed when compressorStream is disposed
                // this allows compressorStream to close and flush its buffers to compressedStream and guarantees that compressedStream.ToArray() can be called afterward
                // although MSDN documentation states that ToArray() can be called on a closed MemoryStream, I don't want to rely on that very odd behavior should it ever change
                using (var compressorStream = new DeflateStream(compressedStream, CompressionLevel.Fastest, true))
                {
                    uncompressedStream.CopyTo(compressorStream);
                }

                // call compressedStream.ToArray() after the enclosing DeflateStream has closed and flushed its buffer to compressedStream
                compressedBytes = compressedStream.ToArray();
            }
        }

        return Convert.ToBase64String(compressedBytes);
    }

    /// <summary>
    /// Decompresses a deflate compressed, Base64 encoded string and returns an uncompressed string.
    /// </summary>
    /// <param name="compressedString">String to decompress.</param>
    public static string Decompress(this string compressedString)
    {
        byte[] decompressedBytes;

        var compressedStream = new MemoryStream(Convert.FromBase64String(compressedString));

        using (var decompressorStream = new DeflateStream(compressedStream, CompressionMode.Decompress))
        {
            using (var decompressedStream = new MemoryStream())
            {
                decompressorStream.CopyTo(decompressedStream);

                decompressedBytes = decompressedStream.ToArray();
            }
        }

        return Encoding.UTF8.GetString(decompressedBytes);
    }

This is an updated version for .NET 4.5 and newer using async/await and IEnumerables:这是使用 async/await 和 IEnumerables 的 .NET 4.5 和更新版本的更新版本:

public static class CompressionExtensions
{
    public static async Task<IEnumerable<byte>> Zip(this object obj)
    {
        byte[] bytes = obj.Serialize();

        using (MemoryStream msi = new MemoryStream(bytes))
        using (MemoryStream mso = new MemoryStream())
        {
            using (var gs = new GZipStream(mso, CompressionMode.Compress))
                await msi.CopyToAsync(gs);

            return mso.ToArray().AsEnumerable();
        }
    }

    public static async Task<object> Unzip(this byte[] bytes)
    {
        using (MemoryStream msi = new MemoryStream(bytes))
        using (MemoryStream mso = new MemoryStream())
        {
            using (var gs = new GZipStream(msi, CompressionMode.Decompress))
            {
                // Sync example:
                //gs.CopyTo(mso);

                // Async way (take care of using async keyword on the method definition)
                await gs.CopyToAsync(mso);
            }

            return mso.ToArray().Deserialize();
        }
    }
}

public static class SerializerExtensions
{
    public static byte[] Serialize<T>(this T objectToWrite)
    {
        using (MemoryStream stream = new MemoryStream())
        {
            BinaryFormatter binaryFormatter = new BinaryFormatter();
            binaryFormatter.Serialize(stream, objectToWrite);

            return stream.GetBuffer();
        }
    }

    public static async Task<T> _Deserialize<T>(this byte[] arr)
    {
        using (MemoryStream stream = new MemoryStream())
        {
            BinaryFormatter binaryFormatter = new BinaryFormatter();
            await stream.WriteAsync(arr, 0, arr.Length);
            stream.Position = 0;

            return (T)binaryFormatter.Deserialize(stream);
        }
    }

    public static async Task<object> Deserialize(this byte[] arr)
    {
        object obj = await arr._Deserialize<object>();
        return obj;
    }
}

With this you can serialize everything BinaryFormatter supports, instead only of strings.有了这个,您可以序列化BinaryFormatter支持的所有内容,而不仅仅是字符串。

Edit:编辑:

In case, you need take care of Encoding , you could just use Convert.ToBase64String(byte[]) ...如果您需要处理Encoding ,您可以使用Convert.ToBase64String(byte[]) ...

Take a look at this answer if you need an example!如果您需要示例,请查看此答案!

I like @fubo's answer the best but I think this is much more elegant.我最喜欢@fubo 的回答,但我认为这更优雅。

This method is more compatible because it doesn't manually store the length up front.这种方法更兼容,因为它不会预先手动存储长度。

Also I've exposed extensions to support compression for string to string, byte[] to byte[], and Stream to Stream.此外,我还公开了支持字符串到字符串、字节 [] 到字节 [] 和流到流的压缩的扩展。

public static class ZipExtensions
{
    public static string CompressToBase64(this string data)
    {
        return Convert.ToBase64String(Encoding.UTF8.GetBytes(data).Compress());
    }

    public static string DecompressFromBase64(this string data)
    {
        return Encoding.UTF8.GetString(Convert.FromBase64String(data).Decompress());
    }
    
    public static byte[] Compress(this byte[] data)
    {
        using (var sourceStream = new MemoryStream(data))
        using (var destinationStream = new MemoryStream())
        {
            sourceStream.CompressTo(destinationStream);
            return destinationStream.ToArray();
        }
    }

    public static byte[] Decompress(this byte[] data)
    {
        using (var sourceStream = new MemoryStream(data))
        using (var destinationStream = new MemoryStream())
        {
            sourceStream.DecompressTo(destinationStream);
            return destinationStream.ToArray();
        }
    }
    
    public static void CompressTo(this Stream stream, Stream outputStream)
    {
        using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
        {
            stream.CopyTo(gZipStream);
            gZipStream.Flush();
        }
    }

    public static void DecompressTo(this Stream stream, Stream outputStream)
    {
        using (var gZipStream = new GZipStream(stream, CompressionMode.Decompress))
        {
            gZipStream.CopyTo(outputStream);
        }
    }
}

For those who still getting The magic number in GZip header is not correct.对于那些仍然得到GZip 标头中的幻数的人来说是不正确的。 Make sure you are passing in a GZip stream.确保您传入的是 GZip 流。 ERROR and if your string was zipped using php you'll need to do something like:错误,如果您的字符串是使用php压缩的,您需要执行以下操作:

       public static string decodeDecompress(string originalReceivedSrc) {
        byte[] bytes = Convert.FromBase64String(originalReceivedSrc);

        using (var mem = new MemoryStream()) {
            //the trick is here
            mem.Write(new byte[] { 0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00 }, 0, 8);
            mem.Write(bytes, 0, bytes.Length);

            mem.Position = 0;

            using (var gzip = new GZipStream(mem, CompressionMode.Decompress))
            using (var reader = new StreamReader(gzip)) {
                return reader.ReadToEnd();
                }
            }
        }

For .net6 cross platform Compression/Decompression string with C# using SharpZipLib library.对于使用 SharpZipLib 库的 C# 的 .net6 跨平台压缩/解压字符串。 Test for ubuntu(18.0.x) and windows.测试 ubuntu(18.0.x) 和 windows。

#region helper

private byte[] Zip(string text)
{
    if (text == null)
        return null;

    byte[] ret;
    using (var outputMemory = new MemoryStream())
    {
        using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
        {
            using (var sw = new StreamWriter(gz, Encoding.UTF8))
            {
                sw.Write(text);
            }
        }
        ret = outputMemory.ToArray();
    }
    return ret;
}

private string Unzip(byte[] bytes)
{
    string ret = null;
    using (var inputMemory = new MemoryStream(bytes))
    {
        using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
        {
            using (var sr = new StreamReader(gz, Encoding.UTF8))
            {
                ret = sr.ReadToEnd();
            }
        }
    }
    return ret;
}
#endregion

We can reduce code complexity by using StreamReader and StreamWriter rather than manually converting strings to byte arrays.我们可以通过使用 StreamReader 和 StreamWriter 而不是手动将字符串转换为字节数组来降低代码复杂度。 Three streams is all you need:您只需要三个流:

    public static byte[] Zip(string uncompressed)
    {
        byte[] ret;
        using (var outputMemory = new MemoryStream())
        {
            using (var gz = new GZipStream(outputMemory, CompressionLevel.Optimal))
            {
                using (var sw = new StreamWriter(gz, Encoding.UTF8))
                {
                    sw.Write(uncompressed);
                }
            }
            ret = outputMemory.ToArray();
        }
        return ret;
    }

    public static string Unzip(byte[] compressed)
    {
        string ret = null;
        using (var inputMemory = new MemoryStream(compressed))
        {
            using (var gz = new GZipStream(inputMemory, CompressionMode.Decompress))
            {
                using (var sr = new StreamReader(gz, Encoding.UTF8))
                {
                    ret = sr.ReadToEnd();
                }
            }
        }
        return ret;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM