简体   繁体   English

将 Magic 命令与 python 多处理库一起使用

[英]Using Magic command with python Multiprocessing library

I'm trying to run Jupyter notebook file for each inputs in the python list from another notebook I've used Jupyter Notebook's magic command %run to accomplish the task我正在尝试为来自另一个笔记本的 python 列表中的每个输入运行 Jupyter 笔记本文件我使用 Jupyter Notebook 的魔术命令%run来完成任务

input_list= [1,  131,  312,  327,  348,  485,  469, 1218, 1329, 11212]
for i in input_list:
    try:
        input = i
        !run ./notebook.ipynb 
    except:
        pass

Code is working but the execution time is very high So I decided to use Multiprocessing Libraries with the code to execute the code faster代码正在运行,但执行时间非常长所以我决定将多处理库与代码一起使用以更快地执行代码

function using inside multiprocessing使用内部多处理的函数

def function(i):
    try:
        input = i
        print(input)#print the current element passed
        %run ./notebook.ipynb
    except:
        pass

multiproccessing code多处理代码

    from multiprocessing import Pool, cpu_count
    from tqdm import tqdm

    p = Pool(8)

    tqdm(p.imap(function, input_list))

    p.close()
    p.join()

But problem here is the argument that is passed to Function is not passed to notebook used in %run magic command但是这里的问题是传递给 Function 的参数没有传递给 %run 魔术命令中使用的笔记本

I got a error like "input is not defined"我收到类似“未定义输入”的错误

What would be a possible solution for this problem?这个问题的可能解决方案是什么?

It works when you follow the guide here to how to use arguments.当您按照此处的指南了解如何使用参数时,它就会起作用。
Illustrating with a minimal working example .用一个最小的工作示例来说明。

Make a notebook called add3.ipynb with the following contents as the only cell in it:制作一个名为add3.ipynb的笔记本,其中唯一的单元格包含以下内容:

o = i + 3
print (f"where the input is {i}; the  output is {o}\n")

Then for your notebook to control the running with various values like you want, use in a code cell the following:然后,为了让您的笔记本使用您想要的各种值来控制运行,请在代码单元中使用以下内容:

# based on https://pymotw.com/3/multiprocessing/basics.html
import multiprocessing

def worker(i):
    try:
        print (f"input is {i}\n")#print the current element passed
        %run ./add3.ipynb
    except:
        pass
    

input_list= [1,  131,  312,  327,  348,  485,  469, 1218, 1329, 11212]


if __name__ == '__main__':
    jobs = []
    for i in input_list:
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()

I'll paste a typical run of that at the bottom of this post.我将在这篇文章的底部粘贴一个典型的运行。


I still suggest you use papermill to do this so you can parameterize the notebook and then save the files with the new versions, as if a report.我仍然建议您使用papermill来执行此操作,以便您可以参数化笔记本,然后将文件与新版本一起保存,就像报告一样。

Alternatively, you can use other means to inject code or construct the notebook to run with the input value.或者,您可以使用其他方式注入代码或构建笔记本以使用输入值运行。 A lot of the times I use a template in string from inside a script with a placeholder for the value.很多时候,我在脚本内部使用字符串中的模板,并用占位符表示值。 Then I run the script to generate the notebooks with the value in them using string.replace() method, save the resulting strings as notebook files, and then run those notebooks using jupytext or jupyter nbconvert .然后我运行脚本以使用string.replace()方法生成具有其中值的笔记本,将生成的字符串保存为笔记本文件,然后使用jupytextjupyter nbconvert运行这些笔记本。 nbformat can be useful for building such a notebook file too. nbformat对于构建这样的笔记本文件也很有用。 That way you can generate reports in notebook form with the results from each run.这样,您就可以生成笔记本形式的报告,其中包含每次运行的结果。

Also, if you don't need the code your calling to be in a notebook, it is often more convenient to save it as a python script (ending in .py ) or an ipython script (ending in .ipy ).此外,如果您不需要将调用的代码保存在笔记本中,则将其保存为 python 脚本(以.py结尾)或 ipython 脚本(以.ipy结尾)通常更方便。 (The latter allows you to use IPython magics in a script and is often an easier way to develop when you are used to Jupyter. However, the resulting script runs much slower then pure Python and so I usually end up converting to pure Python and only use the .ipy form early in development.) For example, the contents of the one cell in my example add3.ipynb could simply have been a script add3.py saved. (后者允许您在脚本中使用 IPython 魔法,并且当您习惯了 Jupyter 时,这通常是一种更简单的开发方式。但是,生成的脚本运行速度比纯 Python 慢得多,所以我通常最终转换为纯 Python,并且只在开发早期使用.ipy形式。)例如,我的示例add3.ipynb中的一个单元格的内容可能只是一个脚本add3.py保存。 And then from in a notebook I can run it like the following (leaving out multiprocessing for sake of simplicity):然后在笔记本中我可以像下面这样运行它(为了简单起见,省略了多处理):

input_list= [1,  131,  312,  327,  348,  485,  469, 1218, 1329, 11212]
for i in input_list:
    %run -i add3.py

Note the use of the -i option with %run to "run the file in IPython's namespace instead of an empty one."请注意使用-i选项和%run来“在 IPython 的命名空间中运行文件,而不是在空的命名空间中运行文件”。 Note that option isn't necessary when using %run to run another notebook, because as by default, it's as if you are running the other notebook in the calling the notebook.请注意,当使用%run运行另一个笔记本时,该选项不是必需的,因为默认情况下,就好像您在调用笔记本时运行另一个笔记本一样。 I like the greater flexibility using %run in conjunction with a script because often I don't want the script running in the same namespace.我喜欢将%run与脚本结合使用的更大灵活性,因为我通常不希望脚本在同一个命名空间中运行。 The alternatives I mentioned (papermill, jupytext, &jupyter nbconvert) to execute an external notebook separate from the current namepsace.我提到的替代方案(papermill、jupytext 和 &jupyter nbconvert)来执行与当前命名空间分开的外部笔记本。


Result seen when running the minimal working example:运行最小工作示例时看到的结果:

input is 1

input is 131

input is 312
input is 327


input is 348
input is 485


input is 469
input is 1218


input is 11212
input is 1329

where the input is 131; the  output is 134


where the input is 1; the  output is 4
where the input is 312; the  output is 315
where the input is 327; the  output is 330


where the input is 485; the  output is 488


where the input is 1218; the  output is 1221
where the input is 469; the  output is 472


where the input is 348; the  output is 351

where the input is 1329; the  output is 1332

where the input is 11212; the  output is 11215

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM