How to keep a very large dictionary loaded in memory in python?

Question

I have a very large dictionary of size ~ 200 GB which I need to query very often for my algorithm. To get quick results, I want to put it in memory which is possible, because fortunately I have a 500GB RAM.

However, my main issue is that I want to load it only once in memory and then let other processes query the same dictionary, rather than having to load it again everytime I create a new process or iterate over my code.

So, I would like something like this:

Script 1:

 # Load dictionary in memory
 def load(data_dir):
     dictionary = load_from_dir(data_dir) ...

Script 2:

 # Connect to loaded dictionary (already put in memory by script 1)
 def use_dictionary(my_query):
     query_loaded_dictionary(my_query)

What's the best way to achieve this ? I have considered a rest API, but I wonder if going over a REST request will erode all the speed I gained by putting the dictionary in memory in the first place.

Any suggestions ?

Answer 1

Either run a separate service that you access with a REST API like you mentioned, or use an in-memory database.

I had a very good experience with Redis personally, but there are many others ( Memcached is also popular). Redis was easy to use with Python and Django.

In both solutions there can be data serialization though, so some performance will be dropped. There is a way to fill Redis with simple structures such as lists, but I haven't tried. I packed my numeric arrays and serialized them (with numpy), it was fast enough in the end. If you use simple string key-value pairs anyway, then the performance will be optimal, and maybe better with memcached.

How to keep a very large dictionary loaded in memory in python?

Question

1 answers

solution1
1 2016-11-24 09:52:15

How to keep a very large dictionary loaded in memory in python?

Question

1 answers

solution1 1 2016-11-24 09:52:15

solution1
1 2016-11-24 09:52:15