[英]Is there a way to run all Jupyter Notebooks inside a directory?
I have a lot of Jupyter Notebooks inside a directory and I want to run them all to see them output.我在一个目录中有很多 Jupyter Notebooks,我想全部运行它们以查看它们 output。
I have to open them each one , click on "Restart kernel and re-run the whole notebook?"我要一个一个打开,点击“重启kernel重新运行整个笔记本?” , wait a few minutes and then go for the next one.
,等待几分钟,然后 go 进行下一个。
Find a way to just "press a button" (it can be a script, command, or everything ), go away for a walk, and come back reading what's the output.找到一种方法来“按下按钮”(它可以是脚本、命令或任何东西),go 出去散步,然后回来阅读 output 是什么。
Thanks in advance!提前致谢!
You can achieve this with nbconvert
or papermill
.您可以使用
nbconvert
或papermill
实现此目的。 See also this answer .另请参阅此答案。
This is an example in papermill
:这是
papermill
的一个例子:
Installation with Anaconda:使用 Anaconda 安装:
conda install -c conda-forge papermill
Create a new notebook that runs all the notebooks in a specific directory:创建一个运行特定目录中所有笔记本的新笔记本:
import papermill as pm
from pathlib import Path
for nb in Path('./run_all').glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb # Path to save executed notebook
)
You can convert every single notebook that you want to run in a .py
file and then create a single notebook that imports them as modules.您可以将要在
.py
文件中运行的每个笔记本转换,然后创建一个将它们作为模块导入的单个笔记本。 Something like this:像这样的东西:
script1.py:脚本1.py:
print('This is the first script.')
script2.py:脚本2.py:
print('This is the second script.')
script3.py:脚本3.py:
print('...and this is the last one!')
Now you import them all in a single script (you can create it in Jupyter):现在您将它们全部导入到一个脚本中(您可以在 Jupyter 中创建它):
import script1
import script2
import script3
# This is the first script.
# This is the second script.
# ...and this is the last one!
papermill and nbclient have an overhead since they create a new process to execute the code; papermill和nbclient有开销,因为它们创建了一个新进程来执行代码; plus, they cannot execute notebooks in parallel.
另外,他们不能并行执行笔记本。
I ran some benchmarks, and I'm showing a few options from the fastest to the slowest one (I used these notebooks for benchmarking), and I used the time
command to time the execution.我运行了一些基准测试,并展示了从最快到最慢的几个选项(我使用这些笔记本进行基准测试),并且我使用
time
命令来计时执行。
25.440 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel
from pathlib import Path
from glob import iglob
dag = DAG(executor=Parallel())
for path in iglob('*.ipynb'):
NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))
if __name__ == '__main__':
dag.build(force=True)
This requires:这需要:
pip install ploomber
51.256 total
import papermill as pm
from glob import glob
for nb in glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb,
engine_name='embedded',
)
This requires:这需要:
pip install ploomber-engine
59.324 total
from ploomber import DAG
from ploomber.products import File
from ploomber.tasks import NotebookRunner
from ploomber.executors import Parallel
from pathlib import Path
from glob import iglob
dag = DAG()
for path in iglob('*.ipynb'):
NotebookRunner(Path(path), File(path), dag=dag, papermill_params=dict(engine_name='embedded'))
if __name__ == '__main__':
dag.build(force=True)
This requires:这需要:
pip install ploomber
1:58.79 total
import papermill as pm
from glob import glob
for nb in glob('*.ipynb'):
pm.execute_notebook(
input_path=nb,
output_path=nb,
)
This requires:这需要:
pip install papermill
Note: I did not evaluate nbclient
since the performance is similar to papermill
.注意:我没有评估
nbclient
因为性能类似于papermill
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.