简体   繁体   中英

Avoid memory leaks with strings

I've found a memory leak in my parser. I don't know how to fix that problem. Let's see that basic routing.

    private void parsePage() {
        String[] tmp = null;

        foreach (String row in rows) {
            tmp = row.Split(new []{" "}, StringSplitOptions.None);

            PrivateRow t = new PrivateRow();

            t.Field1 = tmp[1];
            t.Field2 = tmp[2];
            t.Field3 = tmp[3];
            t.Field4 = String.Join(" ", tmp);

            myBigCollection.Add(t);
        }
    }


    private void parseFromFile() {
        String[] tmp = null;

        foreach (String row in rows) {
            PrivateRow t = new PrivateRow();

            t.Field1 = "mystring1";
            t.Field2 = "mystring2222";
            t.Field3 = "mystring3333";
            t.Field4 = "mystring1 xxx yy zzz";

            myBigCollection.Add(t);
        }
    }

Launching parsePage() , on a collection (rows is a List of 100000 elements) make my app grown from 20MB to 70MB.

Launching parseFromFile() , that read SAME collection from file, but avoiding split/join, take about 1MB.

Using a MemoryProfiler, I see that "t" fields and PrivateRow , kkep reference to String.Split() array and Split.Join. I suppose that's because I assign a reference, not a copy, that can be garbage collected.

Ok, use 70mb isn't a big deal, but when I launch on production, with a lot o site, it can raise 2.5-3GB...

Cheers

This isn't a memory leak per se. It's actually behaving properly. The reason your second function uses so much less memory, is simply because you only have four strings in use. Each of these four strings is allocated only once, and subsequent uses of the strings for new t.Fieldx instances actually refer to the same string values. Strings are immutable, so if you refer to the same string value more than once, it can be handled by the same string instances. See the paragraph labelled "Interning" at this article on String in .NET for some more detail on this.

In your first function, you have what are probably mostly different strings for each field, and each time through the loop. That simply is much more varied data. The fact that those strings are held on to is what you want to have happen for as long as your PrivateRow objects exist.

You don't have a memory leak at all, it's just garbage collector takes time to process it.

I suppose that's because I assign a reference, not a copy, that can be garbage collected.

That is not correct assumption. string during assignment is copied, even if it is a reference type. It is special, kind of, unique type inside BCL.

Now what about possible solution, in case you have intensive memory pressure. If you have massive amount of string to process from file, you may look on 2 options.

1) Process them in sequence, by reading a srteam (not load all at once). Loading as less data in memory as possible/required/makes sence.

2) Use MemoryMappedFile to, again, load only chunks of data and process them in sequence.

2nd can be combined with 1st.

Like others have said, there is no evidence of a memory leak here, just delayed garbage collection. All memory should be cleaned up eventually.

That being said, there are a couple things you can do to help keep memory usage lower or recover it more quickly:

1)You should be able to replace

t.Field4 = String.Join(" ", tmp);

with

t.Field4 = row;

You created tmp by splitting row , then you're joining it back together. Avoid creating a new string by just using row .

2) Call GC.Collect(); at the end of the method to request immediate garbage collection. This won't reduce the memory used within the method, but it should free up memory more quickly.

如果您的应用程序对内存使用至关重要,并且有很多重复数据,则应将字符串值替换为Enums。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM