简体   繁体   中英

Adding to List<t> becomes very slow over time

I'm parsing an html table that has about 1000 rows. I'm adding ~10 char string from one <td> in each row to a list<string> object. It's very quick for the first 200 or so loops but then becomes slower and slower over time.

This is the code i'm using:

List<string> myList = new List<string>();
        int maxRows = numRows;


        for (int i = 1; i < maxRows; i++)
        { 
            TableRow newTable = myTable.TableRows[i];
            string coll = string.Format("{0},{1},{2},{3},{4}",newTable.TableCells[0].Text,newTable.TableCells[1].Text,newTable.TableCells[2].Text,newTable.TableCells[3].Text,newTable.TableCells[4].Text);
            myList.Add(coll);
            label1.Text = i.ToString();
        }

Should I use an array instead?

Edit: I threw the above code in a new method that gets run on a new Thread and then updated my label control with this code:

label1.Invoke((MethodInvoker)delegate
                {
                    label1.Text = i.ToString();
                });

Program runs at a consistent speed and doesn't block the UI.

If you roughly know the range (number of items) in your collection it is better to use an array.

Reason : Every time you add an element to the List if the list is full it allocates new block of memory to hold the double the current space and copies everything there and then keeps appending the additional entries till it becomes full, and one more allocation copy cycle.

Following is how it works AFAIK, start with 16 elements by default, when you add 17th element to the list it allocates 32 elemnts and copies 16 there then continues for 17 to 32. and repeats this process, so it is slower but offer flexibility of not having to determine the length beforehand. This might be the reason you're seeing the drag.

Thanks @Dyppl var list = new List<int>(1000); This is one elegant option too, as @Dyppl suggested it is best of both the worlds.

I tested adding strings to a list, and benchmarked it with a LIST_SIZE of 1000000 (one million) items and a LIST_SIZE of 100000 (one hundred thousands) items. This way we can compare how it scales.

I ran each test 5 times and averaged the running times.


var l = new List<string>();
for (var i = 0; i < LIST_SIZE; ++i) {
    l.Add("i = " + i.ToString());
}

LIST_SIZE of 1000000 takes 1519 ms

LIST_SIZE of 100000 takes 96 ms


var l = new List<string>(LIST_SIZE);
for (var i = 0; i < LIST_SIZE; ++i) {
    l.Add("i = " + i.ToString());
}

LIST_SIZE of 1000000 takes 1386 ms

LIST_SIZE of 100000 takes 65 ms


var l = new string[LIST_SIZE];
for (var i = 0; i < LIST_SIZE; ++i) {
    l[i] = "i = " + i.ToString();
}

LIST_SIZE of 1000000 takes 1510 ms

LIST_SIZE of 100000 takes 66 ms

So, we can notice 2 things:

  • it really takes more time to add each items the longer the list gets larger
  • the difference shouldn't be noticeable in a 1000 items list

I would conclude then that the bottleneck is in one of the other methods you call.

Initialize the List with the capacity you expect it to consume:

List<string> myList = new List<string>(maxRows);

Sidenote: If you generate 'very' large lists, the internally increasing storage arrays over time sum up to twice the storage you really need. But if for 1000 entries you already slow down, I suggest investigating the true reason for it with a profiler. May the strings grow to large ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM