I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

Question

In my case, I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings. As I build this dictionary up, my RAM goes up super high. Is there a way to write the dictionary as it is being built to the harddrive rather than the RAM so that I can save some memory? I've heard of something called "pickle" but I don't know if this is a feasible method for what I am doing.

Thanks for your help!

Answer 1

Maybe you should be using a database, but check out the shelve module

If shelve isn't powerful enough for you, there is always the industrial strength ZODB

Answer 2

shelve , as @gnibbler recommends, is what I would no doubt be using, but watch out for two traps: a simple one (all keys must be strings) and a subtle one (as the values don't normally exist in memory, calling mutators on them may not work as you expect).

For the simple problem, it's normally easy to find a workaround (and you do get a clear exception if you forget and try eg using an int or whatever as the key, so it's not hard t remember that you do need a workaround either).

For the subtle problem, consider for example:

x = d['foo']
x.amutatingmethod()
...much later...
y = d['foo']
# is y "mutated" or not now?

the answer to the question in the last comment depends on whether d is a real dict (in which case y will be mutated, and in fact exactly the same object as x ) or a shelf (in which case y will be a distinct object from x , and in exactly the state you last saved to d['foo'] !).

To get your mutations to persist, you need to "save them to disk" by doing

d['foo'] = x

after calling any mutators you want on x (so in particular you cannot just do

d['foo'].mutator()

and expect the mutation to "stick", as you would if d were a dict).

shelve does have an option to cache all fetched items in memory, but of course that can fill up the memory again, and result in long delays when you finally close the shelf object (since all the cached items must be saved back to disk then, just in case they had been mutated). That option was something I originally pushed for (as a Python core committer), but I've since changed my mind and I now apologize for getting it in (ah well, at least it's not the default!-), since the cases it should be used in are rare, and it can often trap the unwary user... sorry.

BTW, in case you don't know what a mutator, or "mutating method", is, it's any method that alters the state of the object you call it on -- eg .append if the object is a list, .pop if the object is any kind of container, and so on. No need to worry if the object is immutable, of course (numbers, strings, tuples, frozensets, ...), since it doesn't have mutating methods in that case;-).

Answer 3

Pickling an entire hash over and over again is bound to run into the same memory pressures that you're facing now -- maybe even worse, with all the data marshaling back and forth.

Instead, using an on-disk database that acts like a hash is probably the best bet; see this page for a quick introduction to using dbm-style databases in your program: http://docs.python.org/library/dbm

They act enough like hashes that it should be a simple transition for you.

Answer 4

"""I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings""" ... I presume that you mean: """I have a class with about 1000 attributes all of type str or list of str . I have a dictionary mapping about 6000 keys of unspecified type to corresponding instances of that class.""" If that's not a reasonable translation, please correct it.

For a start, 1000 attributes in a class is mindboggling. You must be treating the vast majority generically using value = getattr(obj, attr_name) and setattr(obj, attr_name, value) . Consider using a dict instead of an instance: value = obj[attr_name] and obj[attr_name] = value .

Secondly, what percentage of those 6 million attributes are ""? If sufficiently high, you might like to consider implementing a sparse dict which doesn't physically have entries for those attributes, using the __missing__ hook -- docs here .

I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

Question

4 answers

solution1
5 ACCPTED 2010-08-03 00:06:42

solution2
3 2010-08-03 00:36:25

solution3
2 2010-08-03 00:07:52

solution4
1 2010-08-03 06:41:30

I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

Question

4 answers

solution1 5 ACCPTED 2010-08-03 00:06:42

solution2 3 2010-08-03 00:36:25

solution3 2 2010-08-03 00:07:52

solution4 1 2010-08-03 06:41:30

solution1
5 ACCPTED 2010-08-03 00:06:42

solution2
3 2010-08-03 00:36:25

solution3
2 2010-08-03 00:07:52

solution4
1 2010-08-03 06:41:30