简体   繁体   English

如何在没有第三方库的情况下序列化对象+压缩它然后解压缩+反序列化?

[英]How to serialize object + compress it and then decompress + deserialize without third-party library?

I have a big object in memory which I want to save as a blob into database.我在内存中有一个大对象,我想将其作为 blob 保存到数据库中。 I want to compress it before saving because database server is usually not local.我想在保存之前压缩它,因为数据库服务器通常不是本地的。

This is what I have at the moment:这就是我目前所拥有的:

using (var memoryStream = new MemoryStream())
{
  using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
  {
    BinaryFormatter binaryFormatter = new BinaryFormatter();
    binaryFormatter.Serialize(gZipStream, obj);

    return memoryStream.ToArray();
  }
}

However when I zip same bytes with Total Commander it cuts down the size always by 50% at least.但是,当我使用 Total Commander 压缩相同的字节时,它至少总是将大小减少 50%。 With the above code it compresses 58MB to 48MB and anything smaller than 15MB gets even bigger.使用上面的代码,它将 58MB 压缩到 48MB,任何小于 15MB 的东西都会变得更大。

Should I use a third-party zip library or is there a better way of doing this in .NET 3.5.我应该使用第三方 zip 库还是在 .NET 3.5 中有更好的方法来做到这一点。 Any other alternatives to my problem?我的问题还有其他选择吗?

EDIT:编辑:

Just found a bug in a code above.刚刚在上面的代码中发现了一个错误。 Angelo thanks for your fix. Angelo 感谢您的修复。

GZipStream compression is still not great. GZipStream 压缩仍然不是很好。 I gets Average 35% compression by gZipStream compared to TC 48% compression.与 TC 的 48% 压缩相比,我通过 gZipStream 获得了平均 35% 的压缩率。

I have no idea what kind of bytes I was getting out with previous version :)我不知道我用以前的版本得到了什么样的字节:)

EDIT2:编辑2:

I have found how to improve compression from 20% to 47%.我已经找到了如何将压缩率从 20% 提高到 47%。 I had to use two Memory streams instead of one!我不得不使用两个内存流而不是一个! Can anyone explain why is this the case?谁能解释为什么会这样?

Here is a code with 2 memory streams which does a lot better compression !!!这是一个带有 2 个内存流的代码,它的压缩效果更好!!!

using (MemoryStream msCompressed = new MemoryStream())
using (GZipStream gZipStream = new GZipStream(msCompressed, CompressionMode.Compress))
using (MemoryStream msDecompressed = new MemoryStream())
{
  new BinaryFormatter().Serialize(msDecompressed, obj);
  byte[] byteArray = msDecompressed.ToArray();

  gZipStream.Write(byteArray, 0, byteArray.Length);
  gZipStream.Close();
  return msCompressed.ToArray();
}

You have a bug in your code and the explanation is too long for a comment so I present it as an answer even though it's not answering your real question.的代码中有一个错误,而且解释对于评论来说太长了,所以即使它没有回答您的真正问题,我也将其作为答案呈现。

You need to call memoryStream.ToArray() only after closing GZipStream otherwise you are creating compressed data that you will not be able to deserialize.只有关闭GZipStream后才需要调用memoryStream.ToArray() 否则您将创建无法反序列化的压缩数据。

Fixed code follows:固定代码如下:

using (var memoryStream = new System.IO.MemoryStream())
{
  using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
  {
    BinaryFormatter binaryFormatter = new BinaryFormatter();
    binaryFormatter.Serialize(gZipStream, obj);
  }
  return memoryStream.ToArray();
}

The GZipStream writes to the underlying buffer in chunks and also appends a footer to the end of the stream and this is only performed at the moment you close the stream. GZipStream以块的GZipStream写入底层缓冲区,并将页脚附加到流的末尾,这仅在您关闭流时执行。

You can easily prove this by running the following code sample:您可以通过运行以下代码示例轻松证明这一点:

byte[] compressed;
int[] integers = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var mem1 = new MemoryStream();
using (var compressor = new GZipStream(mem1, CompressionMode.Compress))
{
    new BinaryFormatter().Serialize(compressor, integers);
    compressed = mem1.ToArray();
}

var mem2 = new MemoryStream(compressed);
using (var decompressor = new GZipStream(mem2, CompressionMode.Decompress))
{
    // The next line will throw SerializationException
    integers = (int[])new BinaryFormatter().Deserialize(decompressor);
}

GZipStream from .NET 3.5 doesn't allow you to set compression level. .NET 3.5 中的 GZipStream 不允许您设置压缩级别。 This parameter was introduced in .NET 4.5, but I don't know if it will give you better result or upgrade is suitable for you.这个参数是在.NET 4.5 中引入的,但不知道它是否会给您带来更好的结果或升级是否适合您。 Built in algorithm is not very optimal, due to patents AFAIK.由于专利 AFAIK,内置算法不是很优化。 So in 3.5 is only one way to get better compression is to use third party library like SDK provided by 7zip or SharpZipLib .因此,在 3.5 中获得更好压缩的唯一方法是使用第三方库,例如7zipSharpZipLib提供的SDK Probably you should experiment a little bit with different libs to get better compression of your data.也许你应该尝试不同的库一点点地得到更好数据压缩。

The default CompressionLevel used is Optimal , at least according to http://msdn.microsoft.com/en-us/library/as1ff51s , so there is no way to tell the GZipStream to "try harder".. It seems for me that a 3rd party lib would be better.使用的默认 CompressionLevel 是Optimal ,至少根据http://msdn.microsoft.com/en-us/library/as1ff51s ,所以没有办法告诉 GZipStream “更加努力”.. 对我来说似乎第 3 方库会更好。

I personally never considered the GZipStream to be 'good' in terms of the compression - probably they put the effort in minimizing the memory footprint or maximizing speed.我个人从不认为 GZipStream 在压缩方面是“好”的——可能他们努力最小化内存占用或最大化速度。 However, seeing how WindowsXP/WindowsVista/Windows7 handles the ZIP files natively in the Explorer - well.. I cannot say neither it is fast, nor have good compression.. I'd not be surprised if the Explorer in Win7 actually uses the GZipStream - all in all they have implemented it and put into the framework, so probably they use it in many places (ie, seems to be used in HTTP GZIP handling ), so I'd stay away from it I needed an efficient processing.. I've never done any serious research in this topic, as my company bought a nice zip-handler many years ago when the .Net was in its early days.然而,看到 WindowsXP/WindowsVista/Windows7 如何在资源管理器中本地处理 ZIP 文件 - 好吧..我不能说它既不快,也没有很好的压缩......如果 Win7 中的资源管理器实际上使用 GZipStream,我不会感到惊讶- 总而言之,他们已经实现并放入框架中,所以可能他们在很多地方使用它(即,似乎用于HTTP GZIP 处理),所以我会远离它,我需要一个有效的处理..我从未对这个主题进行过任何认真的研究,因为我的公司多年前在 .Net 还处于早期阶段时购买了一个不错的 zip-handler。

edit:编辑:

More refs:更多参考:
http://dotnetzip.codeplex.com/workitem/7159 - but marked as "closed/resolved" in 2009.. maybe you will find something interesting in that code? http://dotnetzip.codeplex.com/workitem/7159 - 但在 2009 年被标记为“已关闭/已解决”……也许您会在该代码中发现一些有趣的东西?

heh, after a few minutes of googling, it seems that 7Zip exposes some C# bindings: http://www.splinter.com.au/compressing-using-the-7zip-lzma-algorithm-in/呵呵,经过几分钟的谷歌搜索,7Zip 似乎暴露了一些 C# 绑定: http : //www.splinter.com.au/compressing-using-the-7zip-lzma-algorithm-in/

edit#2:编辑#2:

just a FYI abou .net4.5: https://stackoverflow.com/a/9808000/717732仅供参考 .net4.5: https ://stackoverflow.com/a/9808000/717732

The original question was related to .NET 3.5.最初的问题与 .NET 3.5 有关。 Three years after, .NET 4.5 is much more likely to be used, my answer is only valid for 4.5.三年后,.NET 4.5 更有可能被使用,我的回答只对 4.5 有效。 As other mentioned earlier, the compression algorithm got good improvements with .NET 4.5正如前面提到的,压缩算法在 .NET 4.5 中得到了很好的改进

Today, I wanted to compress my data set to save some space.今天,我想压缩我的数据集以节省一些空间。 So similar than the original question but for .NET4.5.与原始问题非常相似,但适用于 .NET4.5。 And because I remember having using the same trick with double MemoryStream many years ago, I just gave a try.因为我记得多年前对双 MemoryStream 使用了相同的技巧,所以我只是尝试了一下。 My data set is a container objects with many hashsets and lists of custom ojects with string/int/DateTime properties.我的数据集是一个容器对象,其中包含许多散列集和带有 string/int/DateTime 属性的自定义对象列表。 The data set contains about 45 000 objects and when serialized without compression, it creates a 3500 kB binary file.该数据集包含大约 45 000 个对象,并且在未压缩的情况下进行序列化时,它会创建一个 3500 kB 的二进制文件。

Now, with GZipStream, with single or double MemoryStream as described in the question, or with DeflateStream (which uses zlib in 4.5), I always get a file of 818 kB.现在,使用 GZipStream、问题中描述的单或双 MemoryStream 或 DeflateStream(在 4.5 中使用 zlib),我总是得到 818 kB 的文件。 So I just want to insist here than the trick with double MemoryStream got useless with .NET 4.5.所以我只想在这里坚持,而不是使用双 MemoryStream 的技巧在 .NET 4.5 中变得毫无用处。

Eventually, my generic code is as follow:最终,我的通用代码如下:

     public static byte[] SerializeAndCompress<T, TStream>(T objectToWrite, Func<TStream> createStream, Func<TStream, byte[]> returnMethod, Action catchAction)
        where T : class
        where TStream : Stream
     {
        if (objectToWrite == null || createStream == null)
        {
            return null;
        }
        byte[] result = null;
        try
        {
            using (var outputStream = createStream())
            {
                using (var compressionStream = new GZipStream(outputStream, CompressionMode.Compress))
                {
                    var formatter = new BinaryFormatter();
                    formatter.Serialize(compressionStream, objectToWrite);
                }
                if (returnMethod != null)
                    result = returnMethod(outputStream);
            }
        }
        catch (Exception ex)
        {
            Trace.TraceError(Exceptions.ExceptionFormat.Serialize(ex));
            catchAction?.Invoke();
        }
        return result;
    }

so that I can use different TStream, eg这样我就可以使用不同的 TStream,例如

    public static void SerializeAndCompress<T>(T objectToWrite, string filePath) where T : class
    {
        //var buffer = SerializeAndCompress(collection);
        //File.WriteAllBytes(filePath, buffer);
        SerializeAndCompress(objectToWrite, () => new FileStream(filePath, FileMode.Create), null, () =>
        {
            if (File.Exists(filePath))
                File.Delete(filePath);
        });
    }

    public static byte[] SerializeAndCompress<T>(T collection) where T : class
    {
        return SerializeAndCompress(collection, () => new MemoryStream(), st => st.ToArray(), null);
    }

you can use a custom formatter您可以使用自定义格式化程序

public class GZipFormatter : IFormatter
{
    IFormatter formatter;
    public GZipFormatter()
    {
        this.formatter = new BinaryFormatter();
    }
    public GZipFormatter(IFormatter formatter)
    {
        this.formatter = formatter; 
    }
    ISurrogateSelector IFormatter.SurrogateSelector { get => formatter.SurrogateSelector; set => formatter.SurrogateSelector = value; }
    SerializationBinder IFormatter.Binder { get => formatter.Binder; set => formatter.Binder = value; }
    StreamingContext IFormatter.Context { get => formatter.Context; set => formatter.Context = value; }

    object IFormatter.Deserialize(Stream serializationStream)
    {
        using (GZipStream gZipStream = new GZipStream(serializationStream, CompressionMode.Decompress))
        {
            return formatter.Deserialize(gZipStream);                
        }
    }
    void IFormatter.Serialize(Stream serializationStream, object graph)
    {
        using (GZipStream gZipStream = new GZipStream(serializationStream, CompressionMode.Compress))
        using (MemoryStream msDecompressed = new MemoryStream())
        {
            formatter.Serialize(msDecompressed, graph);
            byte[] byteArray = msDecompressed.ToArray();

            gZipStream.Write(byteArray, 0, byteArray.Length);
            gZipStream.Close();                
        }
    }

then you can use as this :那么你可以这样使用:

IFormatter formatter = new GZipFormatter();
using (Stream stream = new FileStream(path...)){
   formatter.Serialize(stream, obj); 
}        

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM