简体   繁体   English

在 jupyter 笔记本中查找导致文件大小较大的 output 单元

[英]Finding output cells causing large file size in jupyter notebook

I have a jupyter notebook which has ~400 cells.我有一个有约 400 个单元的 jupyter 笔记本。 The total file size is 8MB so I'd like to suppress the output cells that have a large size so as to reduce the overall file size.总文件大小为 8MB,因此我想抑制具有大尺寸的 output 单元,以减小整体文件大小。

There are quite a few possible output cells that could be causing this (mainly matplotlib and seaborn plots) so to avoid spending time on trial and error, is there a way of finding the size of each output cell? There are quite a few possible output cells that could be causing this (mainly matplotlib and seaborn plots) so to avoid spending time on trial and error, is there a way of finding the size of each output cell? I'd like to keep as many output plots as possible as I'll be pushing the work to github for others to see.我想保留尽可能多的 output 图,因为我会将工作推到 github 以供其他人查看。

My idea with nbformat spelled out for running in a cell in a Jupyter notebook cell to get the code cell numbers listed largest to smallest (it will fetch a notebook example first to have something to try it on):我对 nbformat 的想法是在 Jupyter 笔记本单元格中的单元格中运行以获取从最大到最小列出的代码单元格编号(它将首先获取笔记本示例以进行尝试):

############### Get test notebook ########################################
import os
notebook_example = "matplotlib3d-scatter-plots.ipynb"
if not os.path.isfile(notebook_example):
    !curl -OL https://raw.githubusercontent.com/fomightez/3Dscatter_plot-binder/master/matplotlib3d-scatter-plots.ipynb
### Use nbformat to get estimate of output size from code cells. #########
import nbformat as nbf
ntbk = nbf.read(notebook_example, nbf.NO_CONVERT)
size_estimate_dict = {}
for cell in ntbk.cells:
    if cell.cell_type == 'code':
        size_estimate_dict[cell.execution_count] = len(str(cell.outputs))
out_size_info = [k for k, v in sorted(size_estimate_dict.items(), key=lambda item: item[1],reverse=True)]
out_size_info

(To have a place to easily run that code go here and click on the launch binder button. When the session spins up, open a new notebook and paste in the code and run it. Static form of the notebook is here .) (为了有一个地方可以轻松运行代码go并单击launch binder按钮。当 session 启动时,打开一个新笔记本并粘贴代码并运行它。Z84A8921B25F505D0D2077AEB 形式的笔记本在这里是5.4416

Example I tried didn't include Plotly, but it seemed to do similar using a notebook with all Plotly plots.我尝试的示例不包括 Plotly,但使用带有所有 Plotly 绘图的笔记本似乎做类似的事情。 I don't know how it will handle a mix though.我不知道它会如何处理混合。 It may not sort perfectly if different kinds.如果种类不同,它可能无法完美排序。
Hopefully, this gives you an idea though how to do what you wondered.希望这能让您了解如何做您想做的事情。 The code example could be further expanded to use the retrieved size estimates to have nbformat make a copy of the input notebook without the output showing for, say, the top ten largest code cells.代码示例可以进一步扩展,以使用检索到的大小估计值,让 nbformat 复制输入笔记本,而不显示 output,例如,前十个最大的代码单元。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM