简体   繁体   中英

Partially download and serialize big file in C#?

As part of an upcoming project at my university, I need to write a client that downloads a media file from a server and writes it to the local disk. Since these files can be very large, I need to implement partial download and serialization in order to avoid excessive memory use.

What I came up with:

namespace PartialDownloadTester
{
    using System;
    using System.Diagnostics.Contracts;
    using System.IO;
    using System.Net;
    using System.Text;

    public class DownloadClient
    {
        public static void Main(string[] args)
        {
            var dlc = new DownloadClient(args[0], args[1], args[2]);
            dlc.DownloadAndSaveToDisk();
            Console.ReadLine();
        }

        private WebRequest request;

        // directory of file
        private string dir;

        // full file identifier
        private string filePath;

        public DownloadClient(string uri, string fileName, string fileType)
        {
            this.request = WebRequest.Create(uri);
            this.request.Method = "GET";
            var sb = new StringBuilder();
            sb.Append("C:\\testdata\\DownloadedData\\");
            this.dir = sb.ToString();
            sb.Append(fileName + "." + fileType);
            this.filePath = sb.ToString();
        }

        public void DownloadAndSaveToDisk()
        {
            // make sure directory exists
            this.CreateDir();

            var response = (HttpWebResponse)request.GetResponse();
            Console.WriteLine("Content length: " + response.ContentLength);
            var rStream = response.GetResponseStream();
            int bytesRead = -1;
            do
            {
                var buf = new byte[2048];
                bytesRead = rStream.Read(buf, 0, buf.Length);
                rStream.Flush();
                this.SerializeFileChunk(buf);
            }
            while (bytesRead != 0);
        }

        private void CreateDir()
        {
            if (!Directory.Exists(dir))
            {
                Directory.CreateDirectory(dir);
            }
        }

        private void SerializeFileChunk(byte[] bytes)
        {
            Contract.Requires(!Object.ReferenceEquals(bytes, null));
            FileStream fs = File.Open(filePath, FileMode.Append);
            fs.Write(bytes, 0, bytes.Length);
            fs.Flush();
            fs.Close();
        }
    }
}

For testing purposes, I've used the following parameters:

"http://itu.dk/people/janv/mufc_abc.jpg" "mufc_abc" "jpg"

However, the picture is incomplete (only the first ~10% look right) even though the content length prints 63780 which is the actual size of the image.

So my questions are:

  1. Is this the right way to go for partial download and serialization or is there a better/easier approach?
  2. Is the full content of the response stream stored in client memory? If this is the case, do I need to use HttpWebRequest.AddRange to partially download data from the server in order to conserve my client's memory?
  3. How come the serialization fails and I get a broken image?
  4. Do I introduce a lot of overhead when I use the FileMode.Append? (msdn states that this option "seeks to the end of the file")

Thanks in advance

You could definitely simplify your code using a WebClient :

class Program
{
    static void Main()
    {
        DownloadClient("http://itu.dk/people/janv/mufc_abc.jpg", "mufc_abc.jpg");
    }

    public static void DownloadClient(string uri, string fileName)
    {
        using (var client = new WebClient())
        {
            using (var stream = client.OpenRead(uri))
            {
                // work with chunks of 2KB => adjust if necessary
                const int chunkSize = 2048;
                var buffer = new byte[chunkSize];
                using (var output = File.OpenWrite(fileName))
                {
                    int bytesRead;
                    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        output.Write(buffer, 0, bytesRead);
                    }
                }
            }
        }
    }
}

Notice how I am writing only the number of bytes I have actually read from the socket to the output file and not the entire 2KB buffer.

I don't know if this is the source of the problem, however I would change the loop like this

const int ChunkSize = 2048;
var buf = new byte[ChunkSize];
var rStream = response.GetResponseStream();
do {
    int bytesRead = rStream.Read(buf, 0, ChunkSize);
    if (bytesRead > 0) {
        this.SerializeFileChunk(buf, bytesRead);
    }
} while (bytesRead == ChunkSize);

The serialize method would get an additional argument

private void SerializeFileChunk(byte[] bytes, int numBytes)

and then write the right number of bytes

fs.Write(bytes, 0, numBytes);

UPDATE:

I do not see the need for closing and reopening the file each time. I also would use the using statement, which closes the resources, even if an exception should occur. The using statement calls the Dispose() method of the resource at the end, which in turn calls Close() in the case of file streams. using can be applied to all types implementing IDisposable .

var buf = new byte[2048];
using (var rStream = response.GetResponseStream()) {
    using (FileStream fs = File.Open(filePath, FileMode.Append)) {
        do {
            bytesRead = rStream.Read(buf, 0, buf.Length);
            fs.Write(bytes, 0, bytesRead);
        } while (...);
    }
}

The using statement does something like this

{
    var rStream = response.GetResponseStream();
    try
    {
        // do some work with rStream here.
    } finally {
        if (rStream != null) {
            rStream.Dispose();
        }
    }
}

Here is the solution from Microsoft: http://support.microsoft.com/kb/812406

Updated 2021-03-16: seems the original article is not available now. Here is the archived one: https://mskb.pkisolutions.com/kb/812406

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM