使用 Python 和 Windows 进行多处理

Question

I have a code that works with Thread in python, but I wanna switch to Process as if I have understood well that will give me a speed-up.我有一个在 python 中与 Thread 一起使用的代码，但我想切换到 Process ，好像我已经理解了这将给我一个加速。 Here there is the code with Thread:这里有线程的代码：

threads.append(Thread(target=getId, args=(my_queue, read)))
threads.append(Thread(target=getLatitude, args=(my_queue, read)))

The code works putting the return in the Queue and after a join on the threads list, I can retrieve the results.该代码将返回值放入队列中，并在线程列表上加入后，我可以检索结果。 Changing the code and the import statement my code now is like that:更改代码和导入语句我的代码现在是这样的：

threads.append(Process(target=getId, args=(my_queue, read)))
threads.append(Process(target=getLatitude, args=(my_queue, read)))

However it does not execute anything and the Queue is empty, with the Thread the Queue is not empty so I think it is related to Process.但是它不执行任何操作并且队列为空，线程队列不为空所以我认为它与进程有关。 I have read answers in which the Process class does not work on Windows is it true, or there is a way to make it work (adding freeze_support() does not help)?我读过 Process 类在 Windows 上不起作用的答案是真的，还是有办法让它工作（添加 freeze_support() 没有帮助）？ In the negative case, multithreading on windows is actually executed in parallel on different cores?在消极的情况下，windows 上的多线程实际上是在不同的内核上并行执行的？

ref:参考：

Python multiprocessing example not working Python 多处理示例不起作用

Python code with multiprocessing does not work on Windows 具有多处理功能的 Python 代码在 Windows 上不起作用

Multiprocessing process does not join when putting complex dictionary in return queue (in which is described that fork does not exist on Windows) 将复杂字典放入返回队列时，多处理进程未加入（其中描述了 Windows 上不存在 fork）

EDIT: To add some details: the code with Process is actually working on centOS.编辑：添加一些细节：带有 Process 的代码实际上是在 centOS 上工作的。

EDIT2: add a simplified version of my code with processes, code tested on centOS EDIT2：添加我的代码的简化版本和进程，代码在centOS上测试

import pandas as pd
from multiprocessing import Process, freeze_support
from multiprocessing import Queue

#%% Global variables

datasets = []

latitude = []

def fun(key, job):
    global latitude
    if(key == 'LAT'):
        latitude.append(job)

def getLatitude(out_queue, skip = None):
    latDict = {'LAT' : latitude}
    out_queue.put(latDict)

n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
print("Number of baboon:" + str(n))

read = []

for i in range(0,n):
    threads = []
    my_queue = Queue()
    threads.append(Process(target=getLatitude, args=(my_queue, read)))

    for t in threads:
        freeze_support() # try both with and without this line
        t.start()

    for t in threads:
        t.join()

    while not my_queue.empty():
        try:
            job = my_queue.get()
            key = list(job.keys())
            fun(key[0],job[key[0]])
        except:
            print("END")  

    read.append(i)

Answer 1

Per the documentation, you need the following after the function definitions.根据文档，在函数定义之后需要以下内容。 When Python creates the subprocesses, they import your script so the code that runs at the global level will be run multiple times.当 Python 创建子进程时，它们会导入您的脚本，因此在全局级别运行的代码将多次运行。 For the code you only want to run in the main thread:对于只想在主线程中运行的代码：

if __name__ == '__main__':
    n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
    # etc.

Indent the rest of code under this if .缩进此if下的其余代码。

使用 Python 和 Windows 进行多处理

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-12-01 16:48:34

使用 Python 和 Windows 进行多处理

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-12-01 16:48:34

解决方案1
2 已采纳 2017-12-01 16:48:34