简体   繁体   中英

C# Parallel.ForEach() memory usage keeps growing

public string SavePath { get; set; } = @"I:\files\";

public void DownloadList(List<string> list)
{
    var rest = ExcludeDownloaded(list);
    var result = Parallel.ForEach(rest, link=>
    {
        Download(link);
    });
}

private void Download(string link)
{
    using(var net = new System.Net.WebClient())
    {
        var data = net.DownloadData(link);

        var fileName = code to generate unique fileName;
        if (File.Exists(fileName))
            return;

        File.WriteAllBytes(fileName, data);
    }
}

var downloader = new DownloaderService();
var links = downloader.GetLinks();
downloader.DownloadList(links);

I observed the usage of RAM for the project keeps growing在此处输入图片说明

I guess there is something wrong on the Parallel.ForEach(), but I cannot figure it out.

Is there the memory leak, or what is happening?


Update 1

After changed to the new code

private void Download(string link)
{
    using(var net = new System.Net.WebClient())
    {
        var fileName = code to generate unique fileName;
        if (File.Exists(fileName))
            return;
        var data = net.DownloadFile(link, fileName);
        Track theTrack = new Track(fileName);
        theTrack.Title = GetCDName();
        theTrack.Save();
    }
}

在此处输入图片说明

I still observed increasing memory use after keeping running for 9 hours, it is much slowly growing usage though.

Just wondering, is it because that I didn't free the memory use of theTrack file?

Btw, I use ALT package for update file metadata, unfortunately, it doesn't implement IDisposable interface.

使用WebClient.DownloadFile()直接下载到文件,这样您就不会在内存中保存整个文件。

The Parallel.ForEach method is intended for parallelizing CPU-bound workloads. Downloading a file is an I/O bound workload, and so the Parallel.ForEach is not ideal for this case because it needlessly blocks ThreadPool threads. The correct way to do it is asynchronously, with async/await. The recommended class for making asynchronous web requests is the HttpClient , and for controlling the level of concurrency an excellent option is the TPL Dataflow library. For this case it is enough to use the simplest component of this library, the ActionBlock class:

async Task DownloadListAsync(List<string> list)
{
    using (var httpClient = new HttpClient())
    {
        var rest = ExcludeDownloaded(list);
        var block = new ActionBlock<string>(async link =>
        {
            await DownloadFileAsync(httpClient, link);
        }, new ExecutionDataflowBlockOptions()
        {
            MaxDegreeOfParallelism = 10
        });
        foreach (var link in rest)
        {
            await block.SendAsync(link);
        }
        block.Complete();
        await block.Completion;
    }
}

async Task DownloadFileAsync(HttpClient httpClient, string link)
{
    var fileName = Guid.NewGuid().ToString(); // code to generate unique fileName;
    var filePath = Path.Combine(SavePath, fileName);
    if (File.Exists(filePath)) return;
    var response = await httpClient.GetAsync(link);
    response.EnsureSuccessStatusCode();
    using (var contentStream = await response.Content.ReadAsStreamAsync())
    using (var fileStream = new FileStream(filePath, FileMode.Create,
        FileAccess.Write, FileShare.None, 32768, FileOptions.Asynchronous))
    {
        await contentStream.CopyToAsync(fileStream);
    }
}

The code for downloading a file with HttpClient is not as simple as the WebClient.DownloadFile() , but it's what you have to do in order to keep the whole process asynchronous (both reading from the web and writing to the disk).


Caveat: Asynchronous filesystem operations are currently not implemented efficiently in .NET. For maximum efficiency it may be preferable to avoid using the FileOptions.Asynchronous option in the FileStream constructor.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM