简体   繁体   English

有没有办法在一个目录中运行所有 Jupyter Notebooks?

[英]Is there a way to run all Jupyter Notebooks inside a directory?

Introduction介绍

I have a lot of Jupyter Notebooks inside a directory and I want to run them all to see them output.我在一个目录中有很多 Jupyter Notebooks,我想全部运行它们以查看它们 output。

What I actually do我实际上在做什么

I have to open them each one , click on "Restart kernel and re-run the whole notebook?"我要一个一个打开,点击“重启kernel重新运行整个笔记本?” , wait a few minutes and then go for the next one. ,等待几分钟,然后 go 进行下一个。

What I wish to do我想做的事

Find a way to just "press a button" (it can be a script, command, or everything ), go away for a walk, and come back reading what's the output.找到一种方法来“按下按钮”(它可以是脚本、命令或任何东西),go 出去散步,然后回来阅读 output 是什么。

Thanks in advance!提前致谢!

You can achieve this with nbconvert or papermill .您可以使用nbconvertpapermill实现此目的。 See also this answer .另请参阅此答案

This is an example in papermill :这是papermill的一个例子:

Installation with Anaconda:使用 Anaconda 安装:

conda install -c conda-forge papermill

Create a new notebook that runs all the notebooks in a specific directory:创建一个运行特定目录中所有笔记本的新笔记本:

import papermill as pm
from pathlib import Path

for nb in Path('./run_all').glob('*.ipynb'):
    pm.execute_notebook(
        input_path=nb,
        output_path=nb  # Path to save executed notebook
    )

You can convert every single notebook that you want to run in a .py file and then create a single notebook that imports them as modules.您可以将要在.py文件中运行的每个笔记本转换,然后创建一个将它们作为模块导入的单个笔记本。 Something like this:像这样的东西:

script1.py:脚本1.py:

print('This is the first script.')

script2.py:脚本2.py:

print('This is the second script.')

script3.py:脚本3.py:

print('...and this is the last one!')

Now you import them all in a single script (you can create it in Jupyter):现在您将它们全部导入到一个脚本中(您可以在 Jupyter 中创建它):

import script1
import script2
import script3
# This is the first script.
# This is the second script.
# ...and this is the last one!

papermill and nbclient have an overhead since they create a new process to execute the code; papermillnbclient有开销,因为它们创建了一个新进程来执行代码; plus, they cannot execute notebooks in parallel.另外,他们不能并行执行笔记本。

I ran some benchmarks, and I'm showing a few options from the fastest to the slowest one (I used these notebooks for benchmarking), and I used the time command to time the execution.我运行了一些基准测试,并展示了从最快到最慢的几个选项(我使用这些笔记本进行基准测试),并且我使用time命令来计时执行。

Fastest: Ploomber in parallel最快: Ploomber并行

25.440 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel

from pathlib import Path
from glob import iglob

dag = DAG(executor=Parallel())


for path in iglob('*.ipynb'):
    NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))


if __name__ ==  '__main__':
    dag.build(force=True)

This requires:这需要:

pip install ploomber

Papermill using ploomber-engine使用ploomber-engine 的造纸厂

51.256 total
import papermill as pm
from glob import glob

for nb in glob('*.ipynb'):
    pm.execute_notebook(
        input_path=nb,
        output_path=nb,
        engine_name='embedded',
    )

This requires:这需要:

pip install ploomber-engine

Ploomber (serial) Ploomber(系列)

59.324 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel

from pathlib import Path
from glob import iglob

dag = DAG()


for path in iglob('*.ipynb'):
    NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))


if __name__ ==  '__main__':
    dag.build(force=True)

This requires:这需要:

pip install ploomber

Slowest: papermill最慢:造纸厂

1:58.79 total
import papermill as pm
from glob import glob

for nb in glob('*.ipynb'):
    pm.execute_notebook(
        input_path=nb,
        output_path=nb,
    )

This requires:这需要:

pip install papermill

Note: I did not evaluate nbclient since the performance is similar to papermill .注意:我没有评估nbclient因为性能类似于papermill

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM