Python Jupyter Notebook 内的多处理

Question

I am new to the multiprocessing module in Python and work with Jupyter notebooks.我是 Python 中的multiprocessing模块的新手，并且可以使用 Jupyter 笔记本。 I have tried the following code snippet from PMOTW :我尝试了PMOTW的以下代码片段：

import multiprocessing

def worker():
    """worker function"""
    print('Worker')
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

When I run this as is, there is no output.当我按原样运行时，没有 output。

I have also tried creating a module called worker.py and then importing that to run the code:我还尝试创建一个名为worker.py的模块，然后将其导入以运行代码：

import multiprocessing
from worker import worker

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

There is still no output in that case.在这种情况下仍然没有 output。 In the console, I see the following error (repeated multiple times):在控制台中，我看到以下错误（重复多次）：

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
    self = pickle.load(from_parent)
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>

However, I get the expected output when the code is saved as a Python script and exectued.但是，当代码保存为 Python 脚本并执行时，我得到了预期的 output。

What can I do to run this code directly from the notebook without creating a separate script?我可以做些什么来直接从笔记本运行此代码而不创建单独的脚本？

Answer 1

I'm relatively new to parallel computing so I may be wrong with some technicalities.我对并行计算比较陌生，所以我可能对一些技术有误。 My understanding is this:我的理解是这样的：

Jupyter notebooks don't work with multiprocessing because the module pickles (serialises) data to send to processes. Jupyter 笔记本不适用于multiprocessing ，因为模块会腌制（序列化）数据以发送到进程。 multiprocess is a fork of multiprocessing that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. multiprocess是multiprocessing的一个分支，它使用 dill 而不是 pickle 来序列化数据，这允许它在 Jupyter notebook 中工作。 The API is identical so the only thing you need to do is to change API 是相同的，因此您唯一需要做的就是更改

import multiprocessing

to...至...

import multiprocess

You can install multiprocess very easily with a simple你可以很容易地安装multiprocess

pip install multiprocess

You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in).但是，您会发现您的进程仍然不会打印到输出（尽管在 Jupyter 实验室中，它们会打印到服务器输出正在运行的终端）。 I stumbled upon this post trying to work around this and will edit this post when I find out how to.我偶然发现这篇文章试图解决这个问题，当我知道如何做时会编辑这篇文章。

Answer 2

I'm not an export either in multiprocessing or in ipykernel (which is used by jupyter notebook) but because there seems nobody gives an answer, I will tell you what I guessed.我不是multiprocessing或ipykernel （由 jupyter notebook 使用）中的导出，但因为似乎没有人给出答案，所以我会告诉你我的猜测。 I hope somebody complements this later on.我希望以后有人对此进行补充。

I guess your jupyter notebook server is running on Windows host.我猜你的 jupyter notebook 服务器正在 Windows 主机上运行。 In multiprocessing there are three different start methods.在多处理中，有三种不同的启动方法。 Let's focus on spawn , which is the default on windows, and fork , the default on Unix.让我们关注spawn ，这是 windows 上的默认设置，以及fork ，这是 Unix 上的默认设置。

Here is a quick overview.这是一个快速概述。

spawn产卵
- (cpython) interactive shell - always raise an error (cpython) 交互式 shell - 总是引发错误
- run as a script - okay only if you nested multiprocessing code in if __name__ == '__main'__作为脚本运行- 仅当您在if __name__ == '__main'__中嵌套多处理代码时才可以
Fork叉子
- always okay总是好的

For example,例如，

import multiprocessing

def worker():
    """worker function"""
    print('Worker')
    return

if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

This code works when it's saved and run as a script, but raises an error when entered in an python interactive shell.此代码在保存并作为脚本运行时有效，但在 python 交互式 shell 中输入时会引发错误。 Here is the implementation of ipython kernel, and my guess is that that it uses some kind of interactive shell and so doesn't go well with spawn (but please don't trust me). 这是 ipython 内核的实现，我的猜测是它使用了某种交互式 shell，因此不适合spawn （但请不要相信我）。

For a side note, I will give you an general idea of how spawn and fork are different.作为旁注，我将大致了解spawn和fork的不同之处。 Each subprocess is running a different python interpreter in multiprocessing.每个子进程在多处理中运行不同的 python 解释器。 Particularly, with spawn , a child process starts a new interpreter and imports necessary module from scratch.特别是，使用spawn ，子进程会启动一个新的解释器并从头开始导入必要的模块。 It's hard to import code in interactive shell, so it may raise an error.在交互式 shell 中很难导入代码，因此可能会引发错误。

fork is different.叉子是不同的。 With fork, a child process copies the main process including most of the running states of the python interpreter and then continues execution.使用fork，子进程复制主进程，包括python解释器的大部分运行状态，然后继续执行。 This code will help you understand the concept.此代码将帮助您理解这个概念。

import os


main_pid = os.getpid()

os.fork()
print("Hello world(%d)" % os.getpid())  # print twice. Hello world(id1) Hello world(id2)

if os.getpid() == main_pid:
    print("Hello world(main process)")  # print once. Hello world(main process)

Answer 3

This works for me on MAC (cannot make it work on windows):这在 MAC 上对我有用（不能在 Windows 上工作）：

import multiprocessing as mp
mp_start_count = 0

if __name__ == '__main__':
    if mp_start_count == 0:
        mp.set_start_method('fork')
        mp_start_count += 1

Answer 4

将函数保存到单独的 Python 文件中，然后重新导入函数。这样应该可以正常工作。

Answer 5

I found that it's easiest to follow the Multi-processing example .我发现遵循Multi-processing example是最简单的。

So the ThreadPool took care of my issue.所以 ThreadPool 解决了我的问题。

from multiprocessing.pool import ThreadPool as Pool

def worker():
    """worker function"""
    print('Worker\n')
    return


pool = Pool(4)
for result in pool.map(worker, range(5)):
    pass    # or print diagnostics

Python Jupyter Notebook 内的多处理

问题描述

5 个解决方案

解决方案1
25 2019-01-18 11:06:17

解决方案2
4 2018-05-17 06:49:30

解决方案3
0 2021-02-09 00:25:46

解决方案4
0 2022-06-17 18:32:24

解决方案5
0 2022-09-06 08:36:56

Python Jupyter Notebook 内的多处理

问题描述

5 个解决方案

解决方案1 25 2019-01-18 11:06:17

解决方案2 4 2018-05-17 06:49:30

解决方案3 0 2021-02-09 00:25:46

解决方案4 0 2022-06-17 18:32:24

解决方案5 0 2022-09-06 08:36:56

解决方案1
25 2019-01-18 11:06:17

解决方案2
4 2018-05-17 06:49:30

解决方案3
0 2021-02-09 00:25:46

解决方案4
0 2022-06-17 18:32:24

解决方案5
0 2022-09-06 08:36:56