简体   繁体   English

使用Queue写入同一文件的Python多处理

[英]Python Multiprocessing using Queue to write to same file

I know there are many post on Stack Exchange related to writing results from multiprocessing to single file and I have developed my code after reading only those posts. 我知道Stack Exchange上有很多关于将多处理结果写入单个文件的帖子,我在阅读了那些帖子之后开发了我的代码。 What I am trying to achieve is that run 'RevMapCoord' function in parallel and write its result in one single file using multiprocess.queue. 我想要实现的是并行运行'RevMapCoord'函数,并使用multiprocess.queue将其结果写入一个文件中。 But I am having problem while queuing my job. 但是我在排队工作时遇到了问题。 My Code: 我的代码:

def RevMapCoord(list):
    "Read a file, Find String and Do something"

def feed(queue, parlist):
    for par in parlist:
        print ('Echo from Feeder: %s' % (par))
        queue.put(par)
    print ('**Feeder finished queing**')

def calc(queueIn, queueOut):
     print ('Worker function started')
     while True:
         try:
             par = queueIn.get(block = False)
             res = RevMapCoord(final_res)
             queueOut.put((par,res))
         except:
             break

def write(queue, fname):
    fhandle = open(fname, "w")
    while True:
         try:
            par, res = queue.get(block = False)
            print >>fhandle, par, res
         except:
            break
    fhandle.close()


feedProc = Process(target = feed , args = (workerQueue, final_res))
calcProc = [Process(target = calc , args = (workerQueue, writerQueue)) for i in range(nproc)]
writProc = Process(target = write, args = (writerQueue, sco_inp_extend_geno))

feedProc.start()
print ('Feeder is joining')
feedProc.join ()
for p in calcProc:
    p.start()
for p in calcProc:
    p.join()
writProc.start()
writProc.join ()

When I run this code script stucks at "feedProc.start()" step. 当我运行此代码时,脚本停留在“feedProc.start()”步骤。 The last few output lines from screen shows print statement from the end of "feedProc.start()": 屏幕的最后几行输出行显示“feedProc.start()”末尾的print语句:

Echo from Feeder: >AK779,AT61680,50948-50968,50959,6,0.406808,Ashley,Dayne
Echo from Feeder: >AK832,AT30210,1091-1111,1102,7,0.178616,John,Caine
**Feeder finished queing**

But hangs before executing next line "feedProc.join ()". 但在执行下一行“feedProc.join()”之前挂起。 Code gives no error and keep on running but doing nothing(hangs). 代码没有错误,继续运行,但什么都不做(挂起)。 Please tell me what mistake I am making. 请告诉我我犯了什么错误。

I think you should slim your example to the basics. 我认为你应该把你的榜样简化为基础。 For example: 例如:

from multiprocessing import Process, Queue

def f(q):
    q.put('Hello')
    q.put('Bye')
    q.put(None)

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    with open('file.txt', 'w') as fp:
        while True:
            item = q.get()
            print(item)
            if item is None:
                break
            fp.write(item)
    p.join()

Here I have two process (the main process, ap). 这里我有两个过程(主要过程,ap)。 p puts strings in a queue which are retrieved by the main process. p将字符串放入队列中,由主进程检索。 When the main process finds None (a sentinel that I am using to indicate: "I am done" it breaks the loop. 当主进程找到None(我用来表示的标记:“我完成了”它会打破循环。

Extending this to many process (or threads) is trivial. 将其扩展到许多进程(或线程)是微不足道的。

I achieved writing results from multiprocessing to a single file by uing 'map_async' function in Python3. 我通过Python3中的'map_async'函数实现了将多处理写入单个文件的结果。 Here is the function I wrote: 这是我写的函数:

def PPResults(module,alist):##Parallel processing
    npool = Pool(int(nproc))    
    res = npool.map_async(module, alist)
    results = (res.get())###results returned in form of a list 
    return results

So, I provide this function with a list of parameters in 'a_list' and 'module' is a function that does the processing and returns result. 所以,我在'a_list'中为这个函数提供了一个参数列表,'module'是一个执行处理并返回结果的函数。 The above function keeps on collecting the results in form of list and returns back when all the parameters from 'a_list' have been processed. 上述函数继续以列表的形式收集结果,并在处理完'a_list'中的所有参数后返回。 The results might not be correct order but as order was not important for me this worked well. 结果可能不是正确的顺序,但顺序对我来说并不重要,这很有效。 The 'result' list can be iterated and individual results written in file like: 可以迭代“结果”列表,并将单个结果写入文件中,如:

fh_out = open('./TestResults', 'w')
for i in results:##Write Results from list to file
    fh_out.write(i)

To keep the order of the results we might need to use 'queues' similar to I mentioned in my question (above). 为了保持结果的顺序,我们可能需要使用类似于我在上面提到的问题中提到的“队列”。 Though I am being able to fix the code but I believe it is not required to be mentioned here. 虽然我能够修复代码,但我相信这里不需要提及。

Thanks 谢谢

AK AK

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM