简体   繁体   English

如何在multiprocessing.queue中从Process中释放内存?

[英]How do I free memory from Process in multiprocessing.queue?

I have a program that is trying to predict email conversion for every email I send in a week (so, usually 7 sends). 我有一个程序试图预测一周内发送的每封电子邮件的电子邮件转换(因此,通常是7封)。 The output is 7 different files with the prediction scores for each customer. 输出是7个不同的文件,每个客户的预测得分。 Running these serially can take close to 8 hours, so I have tried to parallelize them with multiprocessing . 串行运行这些可能需要将近8个小时,因此我尝试通过multiprocessing将它们并行化。 This speeds things up very well, but I've noticed that after a process finishes it seems to hold onto its memory, until there is none left and one of the processes gets killed by the system without completing its task. 这样可以很好地加快处理速度,但是我注意到,在进程完成后,它似乎会保留在其内存中,直到没有剩余,并且其中一个进程被系统杀死而没有完成其任务。

I've based the following code off of the 'manual pool' example in this answer , as I need to limit the number of processes that start at once due to memory constraints. 在此答案中基于“手动池”示例创建了以下代码,因为由于内存限制,我需要限制立即启动的进程数。 What I would like is that as one process finishes, it releases its memory to the system, freeing up space for the next worker. 我想要的是,当一个进程完成时,它将内存释放到系统中,从而为下一个工作人员释放了空间。

Below is the code that handles concurrency: 下面是处理并发的代码:

def work_controller(in_queue, out_list):
    while True:
        key = in_queue.get()
        print key

        if key == None:
            return

        work_loop(key)
        out_list.append(key)

if __name__ == '__main__':

    num_workers = 4
    manager = Manager()
    results = manager.list()
    work = manager.Queue(num_workers)
    processes = []

    for i in xrange(num_workers):
        p = Process(target=work_controller, args=(work,results))
        processes.append(p)
        p.start()

    iters = itertools.chain([key for key in training_dict.keys()])
    for item in iters:
        work.put(item)

    for p in processes:
        print "Joining Worker"
        p.join()

Here is the actual work code, if that is of any help: 这是实际的工作代码,如果有帮助的话:

def work_loop(key):
    with open('email_training_dict.pkl','rb') as f:
        training_dict = pickle.load(f)
    df_test = pd.DataFrame.from_csv(test_file)
    outdict = {}
    target = 'is_convert'

    df_train = train_dataframe(key)
    features = data_cleanse(df_train,df_test)

    # MAIN PREDICTION
    print 'Start time: {}'.format(datetime.datetime.now()) + '\n'

    # train/test by mailer
    X_train = df_train[features]
    X_test = df_test[features]
    y_train = df_train[target]

    # run model fit
    clf = imbalance.ImbalanceClassifier()

    clf = clf.fit(X_train, y_train)
    y_hat = clf.predict(X_test)

    outdict[key] = clf.y_vote
    print outdict[key]
    print 'Time Complete: {}'.format(datetime.datetime.now()) + '\n'
    with open(output_file,'wb') as f:
        pickle.dump(outdict,f)

I'm assuming, that like the example you linked you are using the Queue.Queue() as your queue object. 我假设,就像您链接的示例一样,您正在使用Queue.Queue()作为队列对象。 This is a blocking queue, which means a call to queue.get() will return an element, or wait/block until it can return an element. 这是一个阻塞队列,这意味着对queue.get()的调用将返回一个元素,或者等待/阻塞直到它可以返回一个元素。 Try changing your work_controller function to the below: 尝试将work_controller函数更改为以下内容:

def work_controller(in_queue, out_list):
  while True: # when the queue is empty return
      try:
          key = in_queue.get(False) # add False to not have the queue block
      except Queue.Empty:
          return
      print key

      work_loop(key)
      out_list.append(key)

While the above solves the blocking issue it gives rise to another. 尽管以上解决了阻塞问题,但又引发了另一个问题。 At the start of the threads' life, there are no items in the in_queue, thus the threads will immediately end. 在线程寿命开始时,in_queue中没有任何项,因此线程将立即结束。

To solve this I suggest you do add a flag to indicate if it is okay to terminate. 为了解决这个问题,我建议您添加一个标志以指示是否可以终止。

global ok_to_end # put this flag in a global space

def work_controller(in_queue, out_list):
  while True: # when the queue is empty return
      try:
          key = in_queue.get(False) # add False to not have the queue block
      except Queue.Empty:
          if ok_to_end: # consult the flag before ending.
              return
      print key

      work_loop(key)
      out_list.append(key)

if __name__ == '__main__':

    num_workers = 4
    manager = Manager()
    results = manager.list()
    work = manager.Queue(num_workers)
    processes = []

    ok_to_end = False # termination flag
    for i in xrange(num_workers):
        p = Process(target=work_controller, args=(work,results))
        processes.append(p)
        p.start()

    iters = itertools.chain([key for key in training_dict.keys()])
    for item in iters:
        work.put(item)

    ok_to_end = True # termination flag set to True after queue is filled

    for p in processes:
        print "Joining Worker"
        p.join()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我是否需要将多处理.Queue实例变量显式传递给子进程在实例方法上执行? - Do I need to explicitly pass multiprocessing.Queue instance variables to a child Process executing on an instance method? Multiprocessing.Queue 没有输出 - No output from Multiprocessing.Queue 我可以在进程内使用“multiprocessing.Queue”进行通信吗? - Can I use a `multiprocessing.Queue` for communication within a process? 如何清除 multiprocessing.Queue? - How to clear a multiprocessing.Queue? 多处理。当进程死亡时,挂起的值 - multiprocessing.Queue hanging when Process dies 如何同时从multiprocessing.Queue()获得put()和get()? - How to put() and get() from a multiprocessing.Queue() at the same time? 进程中的线程的queue.Queue()或multiprocessing.Queue() - queue.Queue() or multiprocessing.Queue() for threads within a Process 我的并行处理代码有问题吗? 如何使用 multiprocessing.Process 和 multiprocessing.Queue 功能? - Is there something wrong in my parallel processing code? How to use multiprocessing.Process and multiprocessing.Queue function? 如何在python中为multiprocessing.Queue实现LIFO? - How to implement LIFO for multiprocessing.Queue in python? 无法从multiprocessing.Queue中获取.get() - Unable to .get() from multiprocessing.Queue
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM