简体   繁体   中英

How to add items to a collection while consuming it?

The example below throws an InvalidOperationException, "Collection was modified; enumeration operation may not execute." when executing the code.

var urls = new List<string>();
urls.Add("http://www.google.com");

foreach (string url in urls)
{
    // Get all links from the url
    List<string> newUrls = GetLinks(url);

    urls.AddRange(newUrls); // <-- This is really the problematic row, adding values to the collection I'm looping
}

How can I rewrite this in a better way? I'm guessing a recursive solution?

You can't, basically. What you really want here is a queue:

var urls = new Queue<string>();
urls.Enqueue("http://www.google.com");

while(urls.Count != 0)
{
    String url = url.Dequeue();
    // Get all links from the url
    List<string> newUrls = GetLinks(url);
    foreach (string newUrl in newUrls)
    {
        queue.Enqueue(newUrl);
    }
}

It's slightly ugly due to there not being an AddRange method in Queue<T> but I think it's basically what you want.

There are three strategies you can use.

  1. Copy the List<> to a second collection (list or array - perhaps use ToArray()). Loop through that second collection, adding urls to the first.
  2. Create a second List<>, and loop through your urls List<> adding new values to the second list. Copy those to the original list when done looping.
  3. Use a for loop instead of a foreach loop. Grab your count up front. List should leave things indexed correctly, so it you add things they will go to the end of the list.

I prefer #3 as it doesn't have any of the overhead associated with #1 or #2. Here is an example:

var urls = new List<string>();
urls.Add("http://www.google.com");
int count = urls.Count;

for (int index = 0; index < count; index++)
{
    // Get all links from the url
    List<string> newUrls = GetLinks(urls[index]);

    urls.AddRange(newUrls);
}

Edit: The last example (#3) assumes that you don't want to process additional URLs as they are found in the loop. If you do want to process additional URLs as they are found, just use urls.Count in the for loop instead of the local count variable as mentioned by configurator in the comments for this answer.

Use foreach with a lambda, it's more fun!

var urls = new List<string>();
var destUrls = new List<string>();
urls.Add("http://www.google.com");
urls.ForEach(i => destUrls.Add(GetLinks(i)));
urls.AddRange(destUrls);

I would create two lists add into the second and then update the reference like this:

var urls = new List<string>();
var destUrls = new List<string>(urls);
urls.Add("http://www.google.com");
foreach (string url in urls)
{    
    // Get all links from the url    
    List<string> newUrls = GetLinks(url);    
    destUrls.AddRange(newUrls);
}
urls = destUrls;

alternately, you could treat the collection as a queue

IList<string> urls = new List<string>();
urls.Add("http://www.google.com");
while (urls.Count > 0)
{
    string url = urls[0];
    urls.RemoveAt(0);
    // Get all links from the url
    List<string> newUrls = GetLinks(url);
    urls.AddRange(newUrls);
}

Don't change the collection you're looping through via for each. Just use a while loop on the Count property of the list and access the List items by index. This way, even if you add items, the iteration should pick up the changes.

Edit: Then again, it sort of depends on whether you WANT the new items you added to be picked up by the loop. If not, then this won't help.

Edit 2: I guess the easiest way to do it would be to just change your loop to: foreach (string url in urls.ToArray())

This will create an Array copy of your list, and it will loop through this instead of the original list. This will have the effect of not looping over your added items.

考虑使用带有while循环的Queue(而q.Count> 0,url = q.Dequeue())而不是迭代。

I assume you want to iterate over the whole list, and each item you add to it? If so I would suggest recursion:

var urls = new List<string>();
var turls = new List<string();
turls.Add("http://www.google.com")

iterate(turls);

function iterate(List<string> u)
{
    foreach(string url in u)
    {
        List<string> newUrls = GetLinks(url);

        urls.AddRange(newUrls);

        iterate(newUrls);
    }
}

You can probably also create a recursive function, like this (untested):

IEnumerable<string> GetUrl(string url)
{
  foreach(string u in GetUrl(url))
    yield return u;
  foreach(string ret_url in WHERE_I_GET_MY_URLS)
    yield return ret_url;
}

List<string> MyEnumerateFunction()
{
  return new List<string>(GetUrl("http://www.google.com"));
}

In this case, you will not have to create two lists, since GetUrl does all the work.

But I may have missed the point of you program.

Jon's approach is right; a queue's the right data structure for this kind of application.

Assuming that you'd eventually like your program to terminate, I'd suggest two other things:

  • don't use string for your URLs, use System.Web.Uri : it provides a canonical string representation of the URL. This will be useful for the second suggestion, which is...
  • put the canonical string representation of each URL you process in a Dictionary. Before you enqueue a URL, check to see if it's in the Dictionary first.

It's hard to make the code better without knowing what GetLinks() does. In any event, this avoids recursion. The standard idiom is you don't alter a collection when you're enumerating over it. While the runtime could have let you do it, the reasoning is that it's a source of error, so better to create a new collection or control the iteration yourself.

  1. create a queue with all urls.
  2. when dequeueing, we're pretty much saying we've processed it, so add it to result.
  3. If GetLinks() returns anything, add those to the queue and process them as well.

.

public List<string> ExpandLinksOrSomething(List<string> urls)
{
    List<string> result = new List<string>();
    Queue<string> queue = new Queue<string>(urls);

    while (queue.Any())
    {
        string url = queue.Dequeue();
        result.Add(url);

        foreach( string newResult in GetLinks(url) )
        {
            queue.Enqueue(newResult);
        }

    }

    return result;
}

The naive implementation assumes that GetLinks() will not return circular references. eg A returns B, and B returns A. This can be fixed by:

        List<string> newItems = GetLinks(url).Except(result).ToList();
        foreach( string newResult in newItems )
        {
            queue.Enqueue(newResult);
        }

* As others point out using a dictionary may be more efficient depending on how many items you process.


I find it strange that GetLinks() would return a value, and then later resolve that to more Url's. Maybe all you want to do is 1-level expansion. If so, we can get rid of the Queue altogether.

public static List<string> StraightProcess(List<string> urls)
{
    List<string> result = new List<string>();

    foreach (string url in urls)
    {
        result.Add(url);
        result.AddRange(GetLinks(url));
    }

    return result;
}

I decided to rewrite it because while other answers used queues, it wasn't apparent that they didn't run forever.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM