简体   繁体   中英

C# Replace collection while enumerating

According to the discussion here , somewhere, someplace on the internet it is verified that replacing some types of collections while enumerating them is possible/thread safe.

My tests below seem to confirm that.

// This test confirmed insufficient by comments
var a = new List<int> { 1, 2, 3 };

Parallel.For(1, 10000, i => {
    foreach (var x in a)
        Console.WriteLine(i + x);
});
Parallel.For(1, 10000, i => a = new List<int> { 1, 2, 3, 4 });

I would however very much like to read some official documentation or some concrete reference pertaining to this fact before i start implementing it in my code.

Can someone verify this/post a link?

As has already been mentioned, you are not in fact mutating a while you're iterating it. You're iterating it a bunch, and then after you're done iterating it a bunch, you're mutating a a bunch, because Parallel.For will block until it has finished executing all of the iterations.

But, even if you were mutating a in parallel with the iterations here, it would in fact be perfectly safe. The foreach is going to read the value of a once at the very start, get a reference to a list, and then from that point forward, it's never going to look at a again . It's going to be working off of local copies to the reference to the list that it got from a , so it won't know or care what changes are made to the variable a after that point. So if you're mutating what list a points to and also iterating a at the same time, then you don't know whether the list being iterated is what was in a before or after the change in another thread, but you know that the list being iterated must be one list or the other, and not some error or mix of the two.

Now if you were mutating the list that a references rather than mutating the variable a to point to a new reference then that would be entirely different. List is not designed to be accessed from multiple threads at the same time, and so all sorts of bad things would happen. If you used a collection specifically designed to be accessed from multiple threads, and you used it in a way it was designed to be used, then it could function properly.

Just to add to Servy's answer and what has been said in the comments, what you have isn't really an illustration of modifying the variable in parallel while iterating over it. Your Parallel.For loops run sequentially - ie first you iterate over the list 10000 times ( possibly in parallel), then you replace it with a new list 10000 times (again, possibly in parallel).

// This doesn't modify or replace the collection at all, it just iterates over it a bunch of times
Parallel.For(1, 10000, i => {
    foreach (var x in a)
        Console.WriteLine(i + x);
});

// This happens AFTER the previous Parallel.For loop completes
// Thus, you're not actually iterating over the loop at this point, just replacing it a bunch of times
Parallel.For(1, 10000, i => a = new List<int> { 1, 2, 3, 4 });

Note that I said possibly in parallel - simply putting something in a Parallel.For loop doesn't guarantee that the framework will actually use multiple threads to accomplish the task, and you can't predict "in advance" how many threads it'll use if it does. Point being that this code doesn't even necessarily prove that these tasks are running on multiple threads (or how many they're running on if they are).

One other flaw in this test: you're replacing the class with the same exact collection every time, so you can't really tell which thread did the final update after the loop is done. Let's say that it uses 3 different threads to execute this - A, B, and C. How do you know which one made the last update to the collection? Recall that a Parallel.For loop is not guaranteed to execute sequentially, so it could have been updated by any of the three. From the documentation (emphasis mine):

The syntax of a parallel loop is very similar to the for and foreach loops you already know, but the parallel loop runs faster on a computer that has available cores. Another difference is that, unlike a sequential loop, the order of execution isn't defined for a parallel loop. Steps often take place at the same time, in parallel. Sometimes, two steps take place in the opposite order than they would if the loop were sequential. The only guarantee is that all of the loop's iterations will have run by the time the loop finishes.

Basically, then, with a Parallel.For loop you have no idea "in advance" the degree of parallelism, whether it uses parallelism at all, or even which order the steps will execute in (so using this construct necessarily entails giving up considerable control of how the code is actually executed).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM