简体   繁体   English

关于垃圾收集的问题C#.NET

[英]Question about Garbage collection C# .NET

I am experiencing problem in my application with OutOfMemoryException. 我在使用OutOfMemoryException的应用程序中遇到问题。 My application can search for words within texts. 我的应用程序可以搜索文本中的单词。 When I start a long running process search to search about 2000 different texts for about 2175 different words the application will terminate at about 50 % through with a OutOfMemoryException (after about 6 hours of processing) 当我开始一个长时间运行的进程搜索大约2175个不同的单词搜索大约2000个不同的文本时,应用程序将通过OutOfMemoryException(大约6个小时的处理后)终止大约50%

I have been trying to find the memory leak. 我一直试图找到内存泄漏。 I have an object graph like: (--> are references) 我有一个像这样的对象图:( - >是引用)

a static global application object (controller) --> an algorithm starter object --> text mining starter object --> text mining algorithm object (this object performs the searching). 静态全局应用程序对象(控制器) - >算法启动器对象 - >文本挖掘启动器对象 - >文本挖掘算法对象(此对象执行搜索)。

The text mining starter object will start the text mining algorithm object's run()-method in a separate thread. 文本挖掘入门对象将在单独的线程中启动文本挖掘算法对象的run() - 方法。

To try to fix the issue I have edited the code so that the text mining starter object will split the texts to search into several groups and initialize one text mining algorithm object for each group of texts sequentially (so when one text mining algorithm object is finished a new will be created to search the next group of texts). 为了解决这个问题,我编写了代码,以便文本挖掘入门对象将文本拆分为多个组并按顺序为每组文本初始化一个文本挖掘算法对象(因此当一个文本挖掘算法对象完成时)将创建一个新的搜索下一组文本)。 Here I set the previous text mining algorithm object to null. 在这里,我将以前的文本挖掘算法对象设置为null。 But this does not solve the issue. 但这并没有解决问题。

When I create a new text mining algorithm object I have to give it some parameters. 当我创建一个新的文本挖掘算法对象时,我必须给它一些参数。 These are taken from properties of the previous text mining algorithm object before I set that object to null. 在将该对象设置为null之前,这些属性取自先前文本挖掘算法对象的属性。 Will this prevent garbage collection of the text mining algorithm object? 这会阻止文本挖掘算法对象的垃圾收集吗?

Here is the code for the creation of new text mining algorithm objects by the text mining algorithm starter: 以下是文本挖掘算法启动程序创建新文本挖掘算法对象的代码:

    private void RunSeveralAlgorithmObjects()
    {

        IEnumerable<ILexiconEntry> currentEntries = allLexiconEntries.GetGroup(intCurrentAlgorithmObject, intNumberOfAlgorithmObjectsToUse);

        algorithm.LexiconEntries = currentEntries;
        algorithm.Run();

        intCurrentAlgorithmObject++;

        for (int i = 0; i < intNumberOfAlgorithmObjectsToUse - 1; i++)
        {
            algorithm = CreateNewAlgorithmObject();
            AddAlgorithmListeners();
            algorithm.Run();
            intCurrentAlgorithmObject++;
        }

    }

    private TextMiningAlgorithm CreateNewAlgorithmObject()
    {
        TextMiningAlgorithm newAlg = new TextMiningAlgorithm();

        newAlg.SortedTermStruct = algorithm.SortedTermStruct;
        newAlg.PreprocessedSynonyms = algorithm.PreprocessedSynonyms;
        newAlg.DistanceMeasure = algorithm.DistanceMeasure;
        newAlg.HitComparerMethod = algorithm.HitComparerMethod;
        newAlg.LexiconEntries = allLexiconEntries.GetGroup(intCurrentAlgorithmObject, intNumberOfAlgorithmObjectsToUse);
        newAlg.MaxTermPercentageDeviation = algorithm.MaxTermPercentageDeviation;
        newAlg.MaxWordPercentageDeviation = algorithm.MaxWordPercentageDeviation;
        newAlg.MinWordsPercentageHit = algorithm.MinWordsPercentageHit;
        newAlg.NumberOfThreads = algorithm.NumberOfThreads;
        newAlg.PermutationType = algorithm.PermutationType;
        newAlg.RemoveStopWords = algorithm.RemoveStopWords;
        newAlg.RestrictPartialTextMatches = algorithm.RestrictPartialTextMatches;
        newAlg.Soundex = algorithm.Soundex;
        newAlg.Stemming = algorithm.Stemming;
        newAlg.StopWords = algorithm.StopWords;
        newAlg.Synonyms = algorithm.Synonyms;
        newAlg.Terms = algorithm.Terms;
        newAlg.UseSynonyms = algorithm.UseSynonyms;

        algorithm = null;

        return newAlg;
    }

Here is the start of the thread that is running the whole search process: 以下是运行整个搜索过程的线程的开始:

            // Run the algorithm in it's own thread
            Thread algorithmThread = new Thread(new ThreadStart
                (RunSeveralAlgorithmObjects));
            algorithmThread.Start();

Can something here prevent the previous text mining algorithm object from being garbage collected? 这里有什么东西可以防止以前的文本挖掘算法对象被垃圾收集?

I recommend first identifying what exactly is leaking. 我建议首先确定究竟是什么泄漏。 Then postulate a cause (such as references in event handlers). 然后假设一个原因(例如事件处理程序中的引用)。

To identify what is leaking: 确定泄漏的内容:

  1. Enable native debugging for the project. 为项目启用本机调试。 Properties -> Debug -> check Enable unmanaged code debugging . Properties -> Debug ->检查Enable unmanaged code debugging
  2. Run the program. 运行程序。 Since the memory leak is probably gradual, you probably don't need to let it run the whole 6 hours; 由于内存泄漏可能是渐进的,你可能不需要让它运行整整6个小时; just let it run for a while and then Debug -> Break All . 让它运行一段时间然后Debug -> Break All
  3. Bring up the Immediate window. 打开立即窗口。 Debug -> Windows -> Immediate
  4. Type one of the following into the immediate window, depending on whether you're running 32 or 64 bit, .NET 2.0/3.0/3.5 or .NET 4.0: 在即时窗口中键入以下内容之一,具体取决于您运行的是32位还是64位,.NET 2.0 / 3.0 / 3.5还是.NET 4.0:

    .load C:\\WINDOWS\\Microsoft.NET\\Framework\\v2.0.50727\\sos.dll for 32-bit .NET 2.0-3.5 .load C:\\WINDOWS\\Microsoft.NET\\Framework\\v2.0.50727\\sos.dll for 32-bit .NET 2.0-3.5

    .load C:\\WINDOWS\\Microsoft.NET\\Framework\\v4.0.30319\\sos.dll for 32-bit .NET 4.0 .load C:\\WINDOWS\\Microsoft.NET\\Framework\\v4.0.30319\\sos.dll for 32-bit .NET 4.0

    .load C:\\WINDOWS\\Microsoft.NET\\Framework64\\v2.0.50727\\sos.dll for 64-bit .NET 2.0-3.5 .load C:\\WINDOWS\\Microsoft.NET\\Framework64\\v2.0.50727\\sos.dll for 64-bit .NET 2.0-3.5

    .load C:\\WINDOWS\\Microsoft.NET\\Framework64\\v4.0.30319\\sos.dll for 64-bit .NET 4.0 .load C:\\WINDOWS\\Microsoft.NET\\Framework64\\v4.0.30319\\sos.dll for 64-bit .NET 4.0

  5. You can now run SoS commands in the Immediate window. 您现在可以在立即窗口中运行SoS命令。 I recommend checking the output of !dumpheap -stat , and if that doesn't pinpoint the problem, check !finalizequeue . 我建议检查!dumpheap -stat的输出,如果没有查明问题,请检查!finalizequeue

Notes: 笔记:

  • Running the program the first time after enabling native debugging may take a long time (minutes) if you have VS set up to load symbols. 如果您已将VS设置为加载符号,则在启用本机调试后第一次运行该程序可能需要很长时间(分钟)。
  • The debugger commands that I recommended both start with ! 我推荐的调试器命令都是从! (exclamation point). (感叹号)。

These instructions are courtesy of the incredible Tess from Microsoft, and Mario Hewardt, author of Advanced .NET Debugging . 这些说明是由微软的Tess高级.NET调试的作者Mario Hewardt提供的。

Once you've identified the leak in terms of which object is leaking, then postulate a cause and implement a fix. 一旦根据哪个对象泄漏确定了泄漏, 然后假定原因并实施修复。 Then you can do these steps again to determine for sure whether or not the fix worked. 然后,您可以再次执行这些步骤以确定该修复是否有效。

1) As I said in a comment, if you use events in your code (the AddAlgorithmListeners makes me suspect this), subscribing to an event can create a "hidden" dependency between objects which is easily forgotten. 1)正如我在评论中所说,如果您在代码中使用事件( AddAlgorithmListeners让我怀疑这一点), 订阅事件可以在对象之间创建一个容易被遗忘的“隐藏”依赖关系。 This dependency can mean that an object is not freed, because someone is still listening to one of it's events. 这种依赖关系可能意味着一个对象没有被释放,因为有人仍然在监听它的一个事件。 Make sure you unsubscribe from all events when you no longer need to listen to them. 确保在不再需要收听时取消订阅所有活动。


2) Also, I'd like to point you to one (probably not-so-off-topic) issue with your code: 2)另外,我想指出你的代码中的一个(可能不是那么偏离主题)问题:

private void RunSeveralAlgorithmObjects()
{
    ...
    algorithm.LexiconEntries = currentEntries;
    // ^ when/where is algorithm initialized?

    for (...)
    {
        algorithm = CreateNewAlgorithmObject();
        ....
    }
}

Is algoritm already initialized when this method is invoked? algoritm在调用此方法时是否已初始化algoritm Otherwise, setting algorithm.LexiconEntries wouldn't seem like a valid thing to do. 否则,设置algorithm.LexiconEntries似乎不是一件有效的事情。 This means your method is dependent on some external state, which to me looks like a potential place for bugs creeping in your program logic. 这意味着你的方法依赖于某些外部状态,对我来说这看起来像是程序逻辑中存在错误的潜在位置。

If I understand it correctly, this object contains some state describing the algorithm, and CreateNewAlgorithmObject derives a new state for algorithm from the current state. 如果我理解正确,该对象包含描述算法的一些状态, CreateNewAlgorithmObject从当前状态派生algorithm的新状态。 If this was my code, I would make algorithm an explicit parameter to all your functions, as a signal that the method depends on this object. 如果这是我的代码, 我会将algorithm作为所有函数的显式参数,作为该方法依赖于此对象的信号。 It would then no longer be hidden "external" state upon which your functions depend. 然后,它将不再隐藏在您的函数所依赖的“外部”状态。

PS: If you don't want to go down that route, the other thing you could consider to make your code more consistent is to turn CreateNewAlgorithmObject into a void method and re-assign algorithm directly inside that method. PS:如果你不想沿着那条路走下去,你可以考虑使你的代码更加一致的另一件事是将CreateNewAlgorithmObject变成一个void方法并直接在该方法中重新分配algorithm

Is AddAlgorithmListeners attaching event handlers to events exposed by the algorithm object ? AddAlgorithmListeners是否将事件处理程序附加到算法对象公开的事件? Are the listening objects living longer than the algorithm object - in which case they can continue to keep the algorithm object from being collected. 听取对象是否比算法对象更长 - 在这种情况下,它们可以继续保持算法对象不被收集。

If yes, try unsubscribing events before you let the object go out of scope. 如果是,请在让对象超出范围之前尝试取消订阅事件。

for (int i = 0; i < intNumberOfAlgorithmObjectsToUse - 1; i++)
        {
            algorithm = CreateNewAlgorithmObject();
            AddAlgorithmListeners();
            algorithm.Run();
            RemoveAlgoritmListeners();    // See if this fixes your issue.
            intCurrentAlgorithmObject++;
        }

my suspect is in AddAlgorithmListeners(); 我怀疑是在AddAlgorithmListeners(); are you sure you remove the listener after execution completed? 你确定在执行完成后删除了监听器吗?

Is the IEnumerable returned by GetGroup() throw-away or cached? GetGroup()返回的IEnumerable是丢弃还是缓存? That is, does it hold onto the objects it has emitted, as if it does it would obviously grow linearly with each iteration. 也就是说,它是否保持它所发射的对象,就好像它一样,它显然会随着每次迭代而线性增长。

Memory profiling is useful, have you tried examining the application with a profiler? 内存分析很有用,您是否尝试使用分析器检查应用程序? I found Red Gate's useful in the past (it's not free, but does have an evaluation version, IIRC). 我发现Red Gate过去很有用(它不是免费的,但确实有评估版,IIRC)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM