简体   繁体   中英

Passing variables between two python processes

I am intended to make a program structure like below

程序结构

PS1 is a python program persistently running. PC1, PC2, PC3 are client python programs. PS1 has a variable hashtable, whenever PC1, PC2... asks for the hashtable the PS1 will pass it to them.

The intention is to keep the table in memory since it is a huge variable (takes 10G memory) and it is expensive to calculate it every time. It is not feasible to store it in the hard disk (using pickle or json) and read it every time when it is needed. The read just takes too long.

So I was wondering if there is a way to keep a python variable persistently in the memory , so it can be used very fast whenever it is needed.

You are trying to reinvent a square wheel, when nice round wheels already exist!

Let's go one level up to how you have described your needs:

  • one large data set, that is expensive to build
  • different processes need to use the dataset
  • performance questions do not allow to simply read the full set from permanent storage

IMHO, we are exactly facing what databases were created for. For common use cases, having many processes all using their own copy of a 10G object is a memory waste, and the common way is that one single process have the data, and the others send requests for the data. You did not describe your problem enough, so I cannot say if the best solution will be:

  • a SQL database like PostgreSQL or MariaDB - as they can cache, if you have enough memory, all will be held automatically in memory
  • a NOSQL database (MongoDB, etc.) if your only (or main) need is single key access - very nice when dealing with lot of data requiring fast but simple access
  • a dedicated server using a dedicate query languages if your needs are very specific and none of the above solutions meet them
  • a process setting up a huge piece of shared memory that will be used by client processes - that last solution will certainly be fastest provided:
    • all clients make read-only accesses - it can be extended to r/w accesses but could lead to a synchronization nightmare
    • you are sure to have enough memory on your system to never use swap - if you do you will lose all the cache optimizations that real databases implement
    • the size of the database and the number of client process and the external load of the whole system never increase to a level where you fall in the swapping problem above

TL/DR: My advice is to experiment what are the performances with a good quality database and optionaly a dedicated chache. Those solution allow almost out of the box load balancing on different machines. Only if that does not work carefully analyze the memory requirements and be sure to document the limits in number of client processes and database size for future maintenance and use shared memory - read-only data being an hint that shared memory can be a nice solution

In short, to accomplish what you are asking about, you need to create a byte array as a RawArray from the multiprocessing.sharedctypes module that is large enough for your entire hashtable in the PS1 server, and then store the hashtable in that RawArray. PS1 needs to be the process that launches PC1, PC2, etc., which can then inherit access to the RawArray. You can create your own class of object that provides the hashtable interface through which the individual variables in the table are accessed that can be separately passed to each of the PC# processes that reads from the shared RawArray.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM