简体   繁体   中英

Export to HTML of Jupyter notebook with table of contents does not embed plots

I have build a notebook for visualization purposes.

First, I have exported it to html using the command line interface of nbconvert : jupyter nbconvert myNotebook.ipynb --no-input Adding the --no-input flag to hide input cells

The output consists of a single html file with embedded plots. As expected

Secondly, In order to add a table of contents to my report I have also installed the jupyter_contrib_nbextensions package.

I use the Table of Contents (2) package. However when I export the notebook to HTML with jupyter nbconvert myNotebook.ipynb --no-input --to html_toc the plots are now saved in a separate folder

I would like to have them embedded in the html file. Anyone ideas how to fix this?

hereby a simple notebook that can be used to illustrate the problem

#%%

import numpy as np
import matplotlib.pyplot as plt

#%% md

# Linear plot with noise

#%%

x = np.linspace(0, 10, 100)
noise = np.random.randn(len(x))
y1 = 2 * x + noise
plt.plot(x, y1);


#%% md

# Sine plot with noise

#%%

y2 = 5 * np.sin(x) + noise
plt.plot(x, y2);

toc2 uses the additional preprocessor ExtractOutputPreprocessor

Create a nbconvert_config.py file with the following content:

c = get_config()

c.NbConvertApp.notebooks = ['mynotebook.ipynb']
c.NbConvertApp.export_format = 'html_toc'
c.Exporter.preprocessors = [
    'nbconvert.preprocessors.TagRemovePreprocessor',
    'nbconvert.preprocessors.RegexRemovePreprocessor',
    'nbconvert.preprocessors.coalesce_streams',
    'nbconvert.preprocessors.CSSHTMLHeaderPreprocessor',
    'nbconvert.preprocessors.HighlightMagicsPreprocessor',
    #'nbconvert.preprocessors.ExtractOutputPreprocessor',
]

and call jupyter nbconvert --config nbconvert_config.py . The key is the last entry in the list which must be commented.

After a few iterations I came up with the following code:

main_name  = 'your_notebook_name' # or use ipynbname to get it automatically
nb_fname   = main_name + '.ipynb' # Input  file
html_fname = main_name + '.html'  # Output file

# Disable the ExtractOutput preprocessor. This prevents images embedded on the notebook (ie: plots) from being externally linked
config = {'ExtractOutputPreprocessor': {'enabled': False}}

# Find the HTML (with TOC) exporter
import nbconvert.exporters.exporter_locator
HTMLTOCExporter = nbconvert.exporters.exporter_locator.get_exporter('html_toc')
exporter = HTMLTOCExporter(config)

# Add a preprocessor to the exporter to remove the notebook cells with the tag 'remove_cell'
from nbconvert.preprocessors import TagRemovePreprocessor
cell_remover = TagRemovePreprocessor(remove_cell_tags={'remove_cell'})
exporter.register_preprocessor(cell_remover, True)

# Generate HTML and write it to a file
html, resources = exporter.from_filename(nb_fname)
with open(html_fname,'w') as f:
    f.write(html)

Bonus points: Embed Markdown images in Markdown inserted using ![text](path_to_file)

You'll need to define a custom Preprocessor and register it before calling exporter.from_filename(...) . Custom Preprocessor

import re
import base64
import os
from nbconvert.preprocessors import Preprocessor

class EmbedExternalImagesPreprocessor(Preprocessor):
    def preprocess_cell(self, cell, resources, cell_index):
        if cell.get('cell_type','') == 'markdown':
            # Find Markdown image pattern: ![alt_text](file)
            embedded_images = re.findall('\!\[(.*?)\]\((.*?)\)',cell['source'])
            
            for alt_text, file in embedded_images:
                # Read each image file and encode it in base64
                with open(file,'br') as f:
                    img_data = f.read()
                    b64_image = base64.b64encode(img_data).decode()
                
                # Generate the HTML tag
                _, file_extension = os.path.splitext(file)
                base64html = f'<img src="data:image/{file_extension};base64,{b64_image}" alt="{alt_text}">'
                
                # Replace Markdown pattern with HTML tag
                cell['source'] = cell['source'].replace(f'![{alt_text}]({file})',base64html)
        return cell, resources

Registering new Preprocessor

main_name  = 'your_notebook_name' # or use ipynbname to get it automatically
nb_fname   = main_name + '.ipynb' # Input  file
html_fname = main_name + '.html'  # Output file

# Disable the ExtractOutput preprocessor. This prevents images embedded on the notebook (ie: plots) from being externally linked
config = {'ExtractOutputPreprocessor': {'enabled': False}}

# Find the HTML (with TOC) exporter
import nbconvert.exporters.exporter_locator
HTMLTOCExporter = nbconvert.exporters.exporter_locator.get_exporter('html_toc')
exporter = HTMLTOCExporter(config)

# Add a preprocessor to the exporter to remove the notebook cells with the tag 'remove_cell'
from nbconvert.preprocessors import TagRemovePreprocessor, ExtractOutputPreprocessor
cell_remover = TagRemovePreprocessor(remove_cell_tags={'remove_cell'})
exporter.register_preprocessor(cell_remover, True)
exporter.register_preprocessor(EmbedExternalImagesPreprocessor(), True)

# Generate HTML and write it to a file
html, resources = exporter.from_filename(nb_fname)
with open(html_fname,'w') as f:
    f.write(html)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM