简体   繁体   English

在两个python进程之间传递变量

[英]Passing variables between two python processes

I am intended to make a program structure like below 我打算制作一个如下的程序结构

程序结构

PS1 is a python program persistently running. PS1是一个持续运行的python程序。 PC1, PC2, PC3 are client python programs. PC1,PC2,PC3是客户端python程序。 PS1 has a variable hashtable, whenever PC1, PC2... asks for the hashtable the PS1 will pass it to them. PS1具有变量哈希表,每当PC1,PC2 ...要求哈希表时,PS1会将其传递给它们。

The intention is to keep the table in memory since it is a huge variable (takes 10G memory) and it is expensive to calculate it every time. 这样做的目的是将表保留在内存中,因为它是一个巨大的变量(占用10G内存),并且每次计算都非常昂贵。 It is not feasible to store it in the hard disk (using pickle or json) and read it every time when it is needed. 将其存储在硬盘中(使用pickle或json)并在需要时每次读取它都是不可行的。 The read just takes too long. 读取时间太长。

So I was wondering if there is a way to keep a python variable persistently in the memory , so it can be used very fast whenever it is needed. 所以我想知道是否有一种方法可以将python变量永久保存在内存中 ,以便可以在需要时快速使用它。

You are trying to reinvent a square wheel, when nice round wheels already exist! 当好的轮毂已经存在时,您正在尝试重塑方形轮毂!

Let's go one level up to how you have described your needs: 让我们上一层描述您的需求:

  • one large data set, that is expensive to build 一个大数据集,建立起来很昂贵
  • different processes need to use the dataset 不同的过程需要使用数据集
  • performance questions do not allow to simply read the full set from permanent storage 性能问题不允许简单地从永久存储中读取全套

IMHO, we are exactly facing what databases were created for. 恕我直言,我们正面临着创建数据库的目的。 For common use cases, having many processes all using their own copy of a 10G object is a memory waste, and the common way is that one single process have the data, and the others send requests for the data. 对于常见用例,让许多进程全部使用其自己的10G对象副本是一种内存浪费,并且常见的方式是一个进程拥有数据,而其他进程发送对数据的请求。 You did not describe your problem enough, so I cannot say if the best solution will be: 您对问题的描述不够,所以我不能说最好的解决方案是:

  • a SQL database like PostgreSQL or MariaDB - as they can cache, if you have enough memory, all will be held automatically in memory 一个SQL数据库(如PostgreSQL或MariaDB),因为它们可以缓存,所以如果您有足够的内存,所有这些都将自动保存在内存中
  • a NOSQL database (MongoDB, etc.) if your only (or main) need is single key access - very nice when dealing with lot of data requiring fast but simple access 如果您唯一(或主要)需要单键访问,则使用NOSQL数据库(MongoDB等)-在处理大量需要快速但简单访问的数据时非常好
  • a dedicated server using a dedicate query languages if your needs are very specific and none of the above solutions meet them 使用专用查询语言的专用服务器(如果您的需求非常具体并且上述解决方案均不能满足它们)
  • a process setting up a huge piece of shared memory that will be used by client processes - that last solution will certainly be fastest provided: 一个进程设置了巨大的共享内存,供客户端进程使用-最后一个解决方案肯定会最快地提供:
    • all clients make read-only accesses - it can be extended to r/w accesses but could lead to a synchronization nightmare 所有客户端都进行只读访问-可以将其扩展为r / w访问,但可能导致同步梦night
    • you are sure to have enough memory on your system to never use swap - if you do you will lose all the cache optimizations that real databases implement 您一定要确保系统上有足够的内存以永不使用swap-如果这样做,您将失去实际数据库实现的所有缓存优化
    • the size of the database and the number of client process and the external load of the whole system never increase to a level where you fall in the swapping problem above 数据库的大小,客户端进程的数量以及整个系统的外部负载永远不会增加到您遇到上述交换问题的程度

TL/DR: My advice is to experiment what are the performances with a good quality database and optionaly a dedicated chache. TL / DR:我的建议是尝试使用高质量的数据库和可选的专用cheche来测试性能。 Those solution allow almost out of the box load balancing on different machines. 这些解决方案几乎可以在不同机器上实现开箱即用的负载平衡。 Only if that does not work carefully analyze the memory requirements and be sure to document the limits in number of client processes and database size for future maintenance and use shared memory - read-only data being an hint that shared memory can be a nice solution 只有在这种方法不起作用的情况下,才能仔细分析内存需求,并确保记录客户端进程数和数据库大小的限制,以便将来维护和使用共享内存-只读数据暗示共享内存可以是一个不错的解决方案

In short, to accomplish what you are asking about, you need to create a byte array as a RawArray from the multiprocessing.sharedctypes module that is large enough for your entire hashtable in the PS1 server, and then store the hashtable in that RawArray. 简而言之,要实现您的要求,您需要从multiprocessing.sharedctypes模块创建一个字节数组作为RawArray,该字节数组对于PS1服务器中的整个哈希表足够大,然后将哈希表存储在该RawArray中。 PS1 needs to be the process that launches PC1, PC2, etc., which can then inherit access to the RawArray. PS1需要是启动PC1,PC2等的进程,然后可以继承对RawArray的访问。 You can create your own class of object that provides the hashtable interface through which the individual variables in the table are accessed that can be separately passed to each of the PC# processes that reads from the shared RawArray. 您可以创建自己的对象类,该对象类提供哈希表接口,通过该接口可以访问表中的各个变量,这些变量可以分别传递给从共享RawArray读取的每个PC#进程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM