简体   繁体   中英

how to decompress big file of more than 100mb in not using any external libraries

I've tried using NuGet packages to extract the tgz file but the tgz contains the file with names having unsupported characters to a file name eg: 1111-11-1111:11:11.111.AA

verified this issue using the sharpcompress lib.

so I had to follow the gist link below

https://gist.github.com/ForeverZer0/a2cd292bd2f3b5e114956c00bb6e872b

this is the link I've followed to extract the tgz file. This is a really nice piece of code and is working well. but when I try to extract big size tgz files more than 100MB an error is getting like the stream is too long.

内存流错误

细节错误

The error means that you are trying to feed too much bytes into MemoryStream , which has a maximum capacity of int.MaxValue (about 2GB).

If you cannot find a suitable library and want to work with provided code, then it can be modified as follows.

Note that entire GZipStream is first copied to MemoryStream . Why? As comment in the code states:

// A GZipStream is not seekable, so copy it first to a MemoryStream

However, in subsequent code, only two operations are used which require stream to be seekable: stream.Seek(x, SeekOrigin.Current) (where x is always positive), and stream.Position . Both of this operations can be emulated by reading the stream, without seeking. For example, to seek forward you can read that amount of bytes and discard:

private static void FakeSeekForward(Stream stream, int offset) {
    if (stream.CanSeek)
        stream.Seek(offset, SeekOrigin.Current);
    else {
        int bytesRead = 0;
        var buffer = new byte[offset];
        while (bytesRead < offset)
        {
            int read = stream.Read(buffer, bytesRead, offset - bytesRead);
            if (read == 0)
                throw new EndOfStreamException();
            bytesRead += read;
        }
    }
}

And to track current stream position you can just store amount of bytes read. Then we can remove converation to MemoryStream and code from the link becomes:

public class Tar
{
    /// <summary>
    /// Extracts a <i>.tar.gz</i> archive to the specified directory.
    /// </summary>
    /// <param name="filename">The <i>.tar.gz</i> to decompress and extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTarGz(string filename, string outputDir)
    {
        using (var stream = File.OpenRead(filename))
            ExtractTarGz(stream, outputDir);
    }

    /// <summary>
    /// Extracts a <i>.tar.gz</i> archive stream to the specified directory.
    /// </summary>
    /// <param name="stream">The <i>.tar.gz</i> to decompress and extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTarGz(Stream stream, string outputDir)
    {
        using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
        {
            // removed convertation to MemoryStream
            ExtractTar(gzip, outputDir);
        }
    }

    /// <summary>
    /// Extractes a <c>tar</c> archive to the specified directory.
    /// </summary>
    /// <param name="filename">The <i>.tar</i> to extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTar(string filename, string outputDir)
    {
        using (var stream = File.OpenRead(filename))
            ExtractTar(stream, outputDir);
    }

    /// <summary>
    /// Extractes a <c>tar</c> archive to the specified directory.
    /// </summary>
    /// <param name="stream">The <i>.tar</i> to extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTar(Stream stream, string outputDir) {
        var buffer = new byte[100];
        // store current position here
        long pos = 0;
        while (true) {
            pos += stream.Read(buffer, 0, 100);
            var name = Encoding.ASCII.GetString(buffer).Trim('\0');
            if (String.IsNullOrWhiteSpace(name))
                break;
            FakeSeekForward(stream, 24);
            pos += 24;
            
            pos += stream.Read(buffer, 0, 12);
            var size = Convert.ToInt64(Encoding.UTF8.GetString(buffer, 0, 12).Trim('\0').Trim(), 8);
            FakeSeekForward(stream, 376);
            pos += 376;

            var output = Path.Combine(outputDir, name);
            if (!Directory.Exists(Path.GetDirectoryName(output)))
                Directory.CreateDirectory(Path.GetDirectoryName(output));
            if (!name.Equals("./", StringComparison.InvariantCulture)) {
                using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write)) {
                    var buf = new byte[size];
                    pos += stream.Read(buf, 0, buf.Length);
                    str.Write(buf, 0, buf.Length);
                }
            }

            var offset = (int) (512 - (pos % 512));
            if (offset == 512)
                offset = 0;
            FakeSeekForward(stream, offset);
            pos += offset;
        }
    }

    private static void FakeSeekForward(Stream stream, int offset) {
        if (stream.CanSeek)
            stream.Seek(offset, SeekOrigin.Current);
        else {
            int bytesRead = 0;
            var buffer = new byte[offset];
            while (bytesRead < offset)
            {
                int read = stream.Read(buffer, bytesRead, offset - bytesRead);
                if (read == 0)
                    throw new EndOfStreamException();
                bytesRead += read;
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM