简体   繁体   中英

iterating over a growing set in python

I have a set, setOfManyElements, which contains n elements. I need to go through all those elements and run a function on each element of S:

for s in setOfManyElements:
   elementsFound=EvilFunction(s)
   setOfManyElements|=elementsFound

EvilFunction(s) returns the set of elements it has found. Some of them will already be in S, some will be new, and some will be in S and will have already been tested.

The problem is that each time I run EvilFunction, S will expand (until a maximum set, at which point it will stop growing). So I am essentially iterating over a growing set. Also EvilFunction takes a long time to compute, so you do not want to run it twice on the same data.

Is there an efficient way to approach this problem in Python 2.7?

LATE EDIT: changed the name of the variables to make them more understandable. Thanks for the suggestion

You can just keep a set of already visited elements and pick a non-yet-visited element each time

visited = set()
todo = S
while todo:
    s = todo.pop()
    visited.add(s)
    todo |= EvilFunction(s) - visited

I suggest an incremental version of 6502's approach:

seen   = set(initial_items)
active = set(initial_items)

while active:
    next_active = set()
    for item in active:
        for result in evil_func(item):
            if result not in seen:
                seen.add(result)
                next_active.add(result)
    active = next_active

This visits each item only once, and when finished seen contains all visited items.

For further research: this is a breadth-first graph search.

Iterating a set in your scenario is a bad idea, as you have no guarantee on the ordering and the iterator are not intended to be used in a modifying set. So you do not know what will happen to the iterator, nor will you know the position of a newly inserted element

However, using a list and a set may be a good idea:

list_elements = list(set_elements)

for s in list_elements:
  elementsFound=EvilFunction(s)
  new_subset = elementsFound - list_elements
  list_elements.extend(new_subset)
  set_elements |= new_subset

Edit

Depending on the size of everything, you could even drop the set entirely

for s in list_elements:
  elementsFound=EvilFunction(s)
  list_elements.extend(i for i in elementsFound if i not in list_elements)

However, I am not sure on the performance of this. I think that you should profile. If the list is huge, then the set -based solution seems good --it is cheap to perform set-based operations. However, for moderate size, maybe the EvilFunction is expensive enough and it doesn't matter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM