简体   繁体   English

如何将大文件(12gb)分成多个1GB压缩(.gz)存档? C#

[英]How to split big file(12gb) into multiple 1GB compressed(.gz) archives? C#

I have a one big .bak file - near 12gb. 我有一个很大的.bak文件-接近12GB。 I need to split it on into multiple 2gb .gz archives in code. 我需要将其拆分为多个2gb .gz归档文件。

And big problem is that I need to validate this archives later. 最大的问题是,我以后需要验证此存档。

You know like when you split one file with winrar on 3 or 4 archives, and then you just push "unpack" and it will unpack them all into one file, or crash if there is not enough archives(you delete one). 您知道当您使用Winrar在3个或4个压缩文件上分割一个文件时,然后按“解压”,它将全部解压缩为一个文件,或者如果没有足够的压缩文件则崩溃(删除一个)。

I need something like this. 我需要这样的东西。

public void Compress(DirectoryInfo directorySelected)
{
    int writeStat = 0;

    foreach (FileInfo fileToCompress in directorySelected.GetFiles())
    {
        using (FileStream originalFileStream = fileToCompress.OpenRead())
        {
            if ((File.GetAttributes(fileToCompress.FullName) &
               FileAttributes.Hidden) != FileAttributes.Hidden & fileToCompress.Extension != ".gz")
            {
                bytesToRead = new byte[originalFileStream.Length];
                int numBytesRead = bytesToRead.Length;

                while (_nowOffset < originalFileStream.Length)
                {                                
                    writeStat = originalFileStream.Read(bytesToRead, 0, homMuchRead);

                    using (FileStream compressedFileStream = File.Create(fileToCompress.FullName + counter + ".gz"))
                    {
                        using (GZipStream compressionStream = new GZipStream(compressedFileStream,
                           CompressionMode.Compress))
                        {
                            compressionStream.Write(bytesToRead, 0, writeStat);
                        }
                    }
                    _nowOffset = _nowOffset + writeStat;                        
                    counter++;
                }
                FileInfo info = new FileInfo(directoryPath + Path.DirectorySeparatorChar + fileToCompress.Name + ".gz");
                //Console.WriteLine($"Compressed {fileToCompress.Name} from {fileToCompress.Length.ToString()} to {info.Length.ToString()} bytes.");
            }
        }
    }
}

It works well, but i don't know how to validate their count. 它运作良好,但我不知道如何验证其计数。

I have 7 archive on test object. 我在测试对象上有7个存档。 But how to read them in one file, and validate that this file is full. 但是如何在一个文件中读取它们,并验证该文件已满。

GZip format doesn't natively supports what you want. GZip格式本身不支持您想要的。

Zip does, the feature is called “spanned archives” but the ZipArchive class from .NET doesn't. Zip确实将其称为“跨区存档”,但.NET中的ZipArchive类却没有。 You'll need a third-party library for that, such as DotNetZip . 为此,您需要一个第三方库,例如DotNetZip

But there's workaround. 但是有解决方法。

Create a class that inherits from Stream abstract one, to the outside pretends it's a single stream that can write but not read or seek, in the implementation writes to multiple pieces, 2GB/each. 创建一个类,该类继承自Stream抽象,从外部来说,它是一个只能写入但不能读取或查找的流,在实现中写入多个片段,每个2GB。 Use .NET provided FileStream in the implementation. 在实现中使用.NET提供的FileStream。 Keep track of the total length written, in a long field of your class. 在班级long ,跟踪所写的总长度。 As soon as the next Write() call gonna exceed 2GB, write just enough bytes to reach 2GB, close and dispose the underlying FileStream, open another file with the next file name, reset file length counter to 0, and write the remaining bytes from the buffer you got to the Write() call. 下一次Write()调用将超过2GB时,立即写入足够的字节以达到2GB,关闭并处置基础FileStream,使用下一个文件名打开另一个文件,将文件长度计数器重置为0,然后从您调用Write()的缓冲区。 Repeat until closed. 重复直到关闭。

Create an instance of your custom stream, pass to the constructor of GZipStream, and copy the complete 12GB source data into the GZipStream. 创建您的自定义流的实例,传递给GZipStream的构造函数,然后将完整的12GB源数据复制到GZipStream中。

If you'll do it right, on output you'll have files exactly 2GB in size (except the last one). 如果操作正确,则输出的文件大小恰好为2GB(最后一个除外)。

To read and decompress them, you'll need to implement similar trick with custom stream. 要读取和解压缩它们,您需要使用自定义流实现类似的技巧。 Write a stream class that concatenates multiple files on the fly, pretending it's a single stream, but this time you only need to implement Read() method. 编写一个流类,将多个文件动态地连接起来,假装它是单个流,但是这次您只需要实现Read()方法。 Give that concatenating stream to the GZipStream from the framework. 将连接流从框架提供给GZipStream If you'll reorder or destroy some parts, there's very high (but not 100%) probability GZipStream will fail to decompress, complaining about CRC checksums. 如果您要重新排序或销毁某些零件,则GZipStream可能无法解压缩,抱怨CRC校验和的可能性很高(但不是100%)。

PS To implement and debug the above 2 streams, I recommend using much smaller dataset, eg 12 MB of data, splitting into 1MB compressed pieces. PS为了实现和调试上述2个流,我建议使用更小的数据集,例如12 MB的数据,分成1MB的压缩段。 Once you'll make it work, increase the constant and test with the complete 12GB of data. 一旦使它起作用,请增加常量并使用完整的12GB数据进行测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM