简体   繁体   English

Python 多处理问题?

[英]Python Multi-Processing Question?

I have a folder with 500 input files (total size of all files is ~ 500[MB]).我有一个包含 500 个输入文件的文件夹(所有文件的总大小约为 500 [MB])。

I'd like to write a python script that does the following:我想编写一个python脚本,它执行以下操作:

(1) load all of the input files to memory (1)将所有输入文件加载到 memory

(2) initializes an empty python list that will later be used... see bullet (4) (2)初始化一个空的python列表,稍后将使用该列表...参见项目符号(4)

(3) start 15 different (independent) processes: each of these uses the same input data [from (1) ] -- yet uses a different algorithms to processes it, thus generating different results (3)启动 15 个不同的(独立的)进程:每个进程都使用相同的输入数据 [来自(1) ]——但使用不同的算法对其进行处理,从而产生不同的结果

(4) I'd like all the independent processes [from step (3) ] to store their output in the same python list [same list that was initialized in step (2) ] (4)我希望所有独立进程 [从步骤(3) ] 将它们的 output 存储在同一个python列表中 [在步骤(2)中初始化的相同列表]

Once all 15 processes have completed their run, I will have one python list that includes the results of all the 15 independent processes.一旦所有 15 个进程都完成了运行,我将有one python list ,其中包括所有 15 个独立进程的结果。

My question is, is it possible to do the above efficiently in python ?我的问题是,是否可以在python中有效地执行上述操作? if so, can you provide a scheme / sample code that illustrates how to do so?如果是这样,您能否提供一个方案/示例代码来说明如何做到这一点?

Note #1: I will be running this on a strong, multi-core server;注意#1:我将在强大的多核服务器上运行它; so the goal here is to use all the processing power while sharing some memory { input data , output list } among all the independent processes.所以这里的目标是在所有独立进程之间共享一些 memory { input data , output list } 的同时使用所有处理能力。

Note #2: I am working in a Linux environment注意#2:我在Linux环境中工作

ok I just whipped this up using zeromq to demonstrate a single subscriber to multiple publishers.好的,我刚刚使用zeromq将一个订阅者展示给多个发布者。 You could probably do the same with queues but you would need to manage them a bit more.您可能可以对队列做同样的事情,但您需要对它们进行更多管理。 zeromq sockets just work which makes it nice for things like this IMO. zeromq sockets 可以正常工作,这对于像这样的 IMO 来说很好。

"""
demo of multiple processes doing processing and publishing the results
to a common subscriber
"""
from multiprocessing import Process


class Worker(Process):
    def __init__(self, filename, bind):
        self._filename = filename
        self._bind = bind
        super(Worker, self).__init__()

    def run(self):
        import zmq
        import time
        ctx = zmq.Context()
        result_publisher = ctx.socket(zmq.PUB)
        result_publisher.bind(self._bind)
        time.sleep(1)
        with open(self._filename) as my_input:
            for l in my_input.readlines():
                result_publisher.send(l)

if __name__ == '__main__':
    import sys
    import os
    import zmq

    #assume every argument but the first is a file to be processed
    files = sys.argv[1:]

    # create a worker for each file to be processed if it exists pass
    # in a bind argument instructing the socket to communicate via ipc
    workers = [Worker(f, "ipc://%s_%s" % (f, i)) for i, f \
               in enumerate((x for x in files if os.path.exists(x)))]

    # create subscriber socket
    ctx = zmq.Context()

    result_subscriber = ctx.socket(zmq.SUB)
    result_subscriber.setsockopt(zmq.SUBSCRIBE, "")

    # wire up subscriber to whatever the worker is bound to 
    for w in workers:
        print w._bind
        result_subscriber.connect(w._bind)

    # start workers
    for w in workers:
        print "starting workers..."
        w.start()

    result = []

    # read from the subscriber and add it to the result list as long
    # as at least one worker is alive
    while [w for w in workers if w.is_alive()]:
        result.append(result_subscriber.recv())
    else:
        # output the result
        print result

oh and to get zmq just哦,只是为了得到 zmq

$ pip install pyzmq-static

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM