简体   繁体   中英

C# Memory Leak?

The following code crashes after sometime due to high memory usage (I open taskmanager and its used memory keeps raising). But I cant see any memory leak except for garbage collection not doing its job. Any suggestions?

//Load a list of regex
//Load a list of phrases
//Open a output file

foreach (string regexString in regexList)
{
    int num = 0;

    Regex regex = new Regex(regexString, RegexOptions.Compiled | RegexOptions.IgnoreCase);

    foreach (string phrase in phraseList)
            num += regex.Matches(phrase).Count;

    if (num > 0)
    {
        output.WriteLine(String.Join(" ", num, phrase));
        output.Flush();
    }
}

EDIT:

Complete Code: http://pastebin.com/0SQYn44z

EDIT2:

I found and posted the solution (foreach loop) Thanks for everyone that tried to help in anyway.

I can't tell from your example, but it is possible that the RegexOptions.Compiled flag causes the problem. From msdn:

Compiled specifies that the regular expression is compiled to an assembly. This yields faster execution but increases startup time. This value should not be assigned to the Options property when calling the CompileToAssembly method.

If a regex is compiled to an assembly, then the generated code can not be unloaded until you restart the application as .Net does not unload assemblies.

This means that if you have a lot of different regular expressions, and you don't reuse them, a compiled Regex is usualy not a good idea.

Chances are that your regex uses excessive backtracking resulting in a long execute time and high memory usage

Can you post the regex and the input so I can check for sure?

You use some types that implement IDisposable without calling Dipose():

  • StreamReader
  • StreamWriter
  • XmlReader

Wrap the use of those types in using statements and see if that improves thing for you.

After seeing your code, I can already recommend not using two, but one combined Regex. Something like:

Regex regex = new Regex(
        @"\b(?:" + s1 + " " + s2 + "|" + s2 + " " + s1 + @")\b",
        RegexOptions.IgnoreCase);

Note that the series of + operators there is converted into a single String.Concat(params string[]) call. With only 2 to 4 parts, it wouldn't even create the temporary array because there are more specialized Concat overloads for that case. Join always uses a temporary array and also needs to take care of the separator string, even if it's empty.

Another thing you might want to keep in mind is that running your program in debug mode will differ quite a lot from running it in release mode. For example that keywords list will remain live until the function returns. I recommend explicitly restricting its lifetime and also cleaning up the other objects as soon as possible. Maybe something like the following:

string[] keywordArray;
using (StreamReader keywordsFile = new StreamReader("keywords.txt"))
{
    List<string> keywords = new List<string>();
    string keyword;
    while ((keyword = keywordsFile.ReadLine()) != null)
        keywords.Add(keyword);

    keywordArray = keywords.ToArray();
}

Or skip it altogether:

string[] keywordArray = File.ReadAllLines("keywords.txt"); // internally uses above code

However, somehow I have the feeling loading your huge feeds file as a whole into memory is a bad idea. It would require a lot less memory if you could deserialize one FeedItem at a time and return it from an iterator method . Then all you need to do is restructure the algorithm to have the loop over feed items as outer-most loop.

Let's see if I've got this right. You have a large list of keywords, and for every permutation of two of those words you create a new string by joining them together with a space in the middle, plus another string with the order of the words reversed (which is unnecessary, BTW). Finally, you create a Regex object from each of the strings.

Having done that, you start iterating over your list of FeedItem objects, taking two attributes of each and forming them into a string like you did with the keywords, and try to match that string with each of the two regexes. In short, I think you should hold off on the finger-pointing for the moment and concentrate on refactoring your code. In fact, I advise you to forget about regexes for now. You aren't using any of their power, and they're distracting you from your real problems.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM