简体   繁体   中英

C# Memory Usage Problem

I have an method and it converts pdf text into a list. After the process the memory usage increase too much. For example a 1000 page pdf use 300mb memory and i can't free it. I have readed some LOH articles but have not find a solution.

 public List<string> GetTextFromPdf()
    {
        if (_pdfDoc.Pages == null) return null;
        List<string> ocrList = new List<string>();

        foreach (var words in _pdfDoc.Pages.Select(s => s.Value.WordList))
        {
            ocrList.AddRange(words.Select(word => word.Word).Select(input => Regex.Replace(input, @"[\W]", "")));
        }

        GC.Collect();
        return ocrList;
    }

This is about normal for a 100 megabyte .pdf. You load the entire thing in memory, that takes double the amount of memory since a character in .NET takes 2 bytes. You will also create a bunch of garbage in the large object heap for the list. Add the typical .NET runtime overhead and 300 megabytes is not an unexpected result.

Check this answer for details on how using the List<>.Capacity property can help reduce the LOH demands.

检查您的pdf加载器是否在某处被引用-以便将其丢弃。

Is your pdf library COM based? You may need to call Marshall.releasecomobject on some of your references when you have finished with them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM