简体   繁体   English

C#内存使用问题

[英]C# Memory Usage Problem

I have an method and it converts pdf text into a list. 我有一个方法,它将pdf文本转换为列表。 After the process the memory usage increase too much. 在该过程之后,内存使用量增加太多。 For example a 1000 page pdf use 300mb memory and i can't free it. 例如,一个1000页的pdf使用300mb的内存,而我无法释放它。 I have readed some LOH articles but have not find a solution. 我已经阅读了一些LOH文章,但没有找到解决方案。

 public List<string> GetTextFromPdf()
    {
        if (_pdfDoc.Pages == null) return null;
        List<string> ocrList = new List<string>();

        foreach (var words in _pdfDoc.Pages.Select(s => s.Value.WordList))
        {
            ocrList.AddRange(words.Select(word => word.Word).Select(input => Regex.Replace(input, @"[\W]", "")));
        }

        GC.Collect();
        return ocrList;
    }

This is about normal for a 100 megabyte .pdf. 对于100兆.pdf来说,这是正常的。 You load the entire thing in memory, that takes double the amount of memory since a character in .NET takes 2 bytes. 您将整个内容加载到内存中,这将占用两倍的内存,因为.NET中的字符需要2个字节。 You will also create a bunch of garbage in the large object heap for the list. 您还将在列表的大对象堆中创建一堆垃圾。 Add the typical .NET runtime overhead and 300 megabytes is not an unexpected result. 添加典型的.NET运行时开销,并且300兆字节并不是意外的结果。

Check this answer for details on how using the List<>.Capacity property can help reduce the LOH demands. 检查此答案以获取有关使用List <>。Capacity属性如何帮助减少LOH需求的详细信息。

检查您的pdf加载器是否在某处被引用-以便将其丢弃。

Is your pdf library COM based? 您的pdf库基于COM吗? You may need to call Marshall.releasecomobject on some of your references when you have finished with them. 完成引用后,可能需要在某些引用上调用Marshall.releasecomobject。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM