简体   繁体   中英

RuntimeError: dictionary changed size during iteration - During Iteration with iteritems on a defaultdict

While answering a particular question here in SO I stumbled upon a peculiar issue which I couldn't explain. Unfortunately the first two Google Search page returned one SO Page which was also not helpful.

The Problem Code

>>> somedata=[random.randint(1,1000) for i in xrange(1,10000)]
>>> somehash=collections.defaultdict(int)
>>> for d in somedata:
    somehash[d]+=1      
>>> maxkey=0
>>> for k,v in somehash.iteritems():
    if somehash[maxkey] > v:
        maxkey=k            

Traceback (most recent call last):
  File "<pyshell#700>", line 1, in <module>
    for k,v in somehash.iteritems():
RuntimeError: dictionary changed size during iteration
>>> for k,v in somehash.iteritems():
    if somehash[maxkey] > v:
        maxkey=k
>>>

And due to some odd reason, the first time I am iterating over the dictionary, Python is creating tantrums but the subsequent executions are fine as you can see in the example, the first time I iterated over the dictionary, it gave the Run Time Error but the next Time it didn't complain.

Any Idea what might be going wrong?

Just in case if required

>>> sys.version_info
sys.version_info(major=2, minor=7, micro=0, releaselevel='final', serial=0)
>>> sys.version
'2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]'

OS: Microsoft Windows [Version 6.1.7601] (Windows 7)

Adding or deleting items of a dictionary while iterating over it is an error. Since somehash is a defaultdict , even what seems like a read-only access in the line

if somehash[maxkey] > k:

might add a new key -- resulting in the error you encountered.

As Sven explained, the error you're running into is due to the way defaultdict works. When performing a lookup in a defaultdict , if the key doesn't already exist a default value is retrieved (hence the name) and the key is added to the dictionary (with the default value). This is the source of your RuntimeError .

You could do the following to avoid this problem:

for k, v in somehash.items():
    if somehash[maxkey] > v:
        maxkey = k

The main difference being that somehash.items() returns a list of (key, value) tuples, so you are actually iterating over that list and not somehash itself. The same goes for .keys() vs .iterkeys() .

You are generating 9999 (somewhat) random integers between 1 and 1000 , stored in somedata , which is used as a key for somehash , to store the occurrence of the numbers in somedata.

As maxkey=0 , that key will never exist. As you are using a defaultdict, when each key is encountered for the first time, an entry is automatically created using the default_factory function which returns an empty list , thus correctly throwing an error during iteration, in a manner that * FastTurtle* already pointed out.

Use the get method to retrieve an item safely.

import random
import collections

somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.defaultdict(int)
for d in somedata:
   somehash[d]+=1 

maxkey=0
for k,v in somehash.iteritems():
   if somehash.get(maxkey) > v:
       maxkey=k 
       print k,v  

I see you are using Python 2.7, which has a new collection class called Counter , for counting hashable objects. Using Counter should be faster than the code above, and reduces your code to:

somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.Counter(somedata)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM