简体   繁体   中英

Execute a Jupyter notebook including inline markdown with nbconvert

I have a Jupyter notebook that includes python variables in markdown cells like this:

code cell:

x = 10

markdown cell:

The value of x is {{x}}.

The IPython-notebook-extension Python Markdown allows me to dynamically display these variables if I execute the markdown cell with shift-enter in the notebook.

markdown cell:

The value of x is 10.

I would like to programmatically execute all cells in the notebook and save them to a new notebook using something like this:

import nbformat
from nbconvert.preprocessors import ExecutePreprocessor

with open('report.ipynb') as f:
    nb = nbformat.read(f, as_version=4)
        ep = ExecutePreprocessor(timeout=600, kernel_name='python3')
        ep.preprocess(nb, {})
with open('report_executed.ipynb', 'wt') as f:
    nbformat.write(nb, f)

This will execute the code cells but not the markdown cells. They still look like this:

The value of x is {{x}}.

I think the issue is that the notebook is not trusted. Is there a way to tell ExecutePreprocessor to trust the notebook? Is there another way to programmatically execute a notebook including python variables in the markdown cells?

The ExecutePreprocessor only looks at code cells , so your markdown cells are completely untouched. To do markdown processing, you need the Python Markdown preprocessor, as you have stated.

Unfortunately, the Python Markdown preprocessor system only executes the code in a live notebook, which it does by modifying the javascript involved with rendering cells . The modification stores the results of executing the code snippets in the cell metadata.

The PyMarkdownPreprocessor class (in pre_pymarkdown.py ) was designed to be used with nbconvert operating on notebooks that had been rendered first in a live notebook setting. It processes markdown cells, replacing {{}} patterns with the values stored in the metadata.

In your situation, however, you don't have the live notebook metadata. I had a similar problem, and I solved it by writing my own execution preprocessor that also included logic to handle the markdown cells:

from nbconvert.preprocessors import ExecutePreprocessor, Preprocessor
import nbformat, nbconvert
from textwrap import dedent

class ExecuteCodeMarkdownPreprocessor(ExecutePreprocessor):

    def __init__(self, **kw):
        self.sections = {'default': True} # maps section ID to true or false
        self.EmptyCell = nbformat.v4.nbbase.new_raw_cell("")

        return super().__init__(**kw)

    def preprocess_cell(self, cell, resources, cell_index):
        """
        Executes a single code cell. See base.py for details.
        To execute all cells see :meth:`preprocess`.
        """

        if cell.cell_type not in ['code','markdown']:
            return cell, resources

        if cell.cell_type == 'code':
            # Do code stuff
            return self.preprocess_code_cell(cell, resources, cell_index)

        elif cell.cell_type == 'markdown':
            # Do markdown stuff
            return self.preprocess_markdown_cell(cell, resources, cell_index)
        else:
            # Don't do anything
            return cell, resources

    def preprocess_code_cell(self, cell, resources, cell_index):
        ''' Process code cell.
        '''
        outputs = self.run_cell(cell)
        cell.outputs = outputs

        if not self.allow_errors:
            for out in outputs:
                if out.output_type == 'error':
                    pattern = u"""\
                        An error occurred while executing the following cell:
                        ------------------
                        {cell.source}
                        ------------------
                        {out.ename}: {out.evalue}
                        """
                    msg = dedent(pattern).format(out=out, cell=cell)
                    raise nbconvert.preprocessors.execute.CellExecutionError(msg)

        return cell, resources

    def preprocess_markdown_cell(self, cell, resources, cell_index):
        # Find and execute snippets of code
        cell['metadata']['variables'] = {}
        for m in re.finditer("{{(.*?)}}", cell.source):
            # Execute code
            fakecell = nbformat.v4.nbbase.new_code_cell(m.group(1))
            fakecell, resources = self.preprocess_code_cell(fakecell, resources, cell_index)

            # Output found in cell.outputs
            # Put output in cell['metadata']['variables']
            for output in fakecell.outputs:
                html = self.convert_output_to_html(output)
                if html is not None:
                    cell['metadata']['variables'][fakecell.source] = html
                    break
        return cell, resources

    def convert_output_to_html(self, output):
        '''Convert IOpub output to HTML

        See https://github.com/ipython-contrib/IPython-notebook-extensions/blob/master/nbextensions/usability/python-markdown/main.js
        '''
        if output['output_type'] == 'error':
            text = '**' + output.ename + '**: ' + output.evalue; 
            return text
        elif output.output_type == 'execute_result' or output.output_type == 'display_data':
            data = output.data
            if 'text/latex' in data:
                html = data['text/latex']
                return html
            elif 'image/svg+xml' in data:
                # Not supported
                #var svg = ul['image/svg+xml'];
                #/* embed SVG in an <img> tag, still get eaten by sanitizer... */
                #svg = btoa(svg);
                #html = '<img src="data:image/svg+xml;base64,' + svg + '"/>';
                return None
            elif 'image/jpeg' in data:
                jpeg = data['image/jpeg']
                html = '<img src="data:image/jpeg;base64,' + jpeg + '"/>'
                return html
            elif 'image/png' in data:
                png = data['image/png']
                html = '<img src="data:image/png;base64,' + png + '"/>'
                return html
            elif 'text/markdown' in data:
                text = data['text/markdown']
                return text
            elif 'text/html' in data:
                html = data['text/html']
                return html
            elif 'text/plain' in data:
                text = data['text/plain']
                # Strip <p> and </p> tags
                # Strip quotes
                # html.match(/<p>([\s\S]*?)<\/p>/)[1]
                text = re.sub(r'<p>([\s\S]*?)<\/p>', r'\1', text)
                text = re.sub(r"'([\s\S]*?)'",r'\1', text)
                return text
            else:
            # Some tag we don't support
                return None
        else:
            return None

You can then process you notebook with logic similar to your posted code:

import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
import ExecuteCodeMarkdownPreprocessor # from wherever you put it
import PyMarkdownPreprocessor # from pre_pymarkdown.py

with open('report.ipynb') as f:
    nb = nbformat.read(f, as_version=4)
    ep = ExecuteCodeMarkdownPreprocessor(timeout=600, kernel_name='python3')
    ep.preprocess(nb, {})
    pymk = PyMarkdownPreprocessor()
    pymk.preprocess(nb, {})

with open('report_executed.ipynb', 'wt') as f:
    nbformat.write(nb, f)

Note that by including the Python Markdown preprocessing, your resultant notebook file will no longer have the {{}} syntax in the markdown cells - the markdown will have static content. If the recipient of the resultant notebook changes the code and executes again, the markdown will not be updated. However, if you are exporting to a different format (such as HTML), then you do want to replace the {{}} syntax with static content.

Update 2021.06.29

This again needs to be updated due to changes in nbconvert to using nbclient call ( https://github.com/jupyter/nbconvert/commit/e7bf8350435a66cc50faf29ff12df492be5d7f57#diff-bee04d71b1dfc0202a0239b1513fd81d983edc339a9734ca4f4813276feed032 ). Since run_cell is no longer available, both the code and markdown cell processing needs to be modified. This works:

import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from nbconvert.preprocessors import execute
import re

# Taken from:
# https://stackoverflow.com/questions/35805121/execute-a-jupyter-notebook-including-inline-markdown-with-nbconvert, modified to avoid using superseeded run_cell calls.

class ExecuteCodeMarkdownPreprocessor(ExecutePreprocessor):

    def __init__(self, **kw):
        self.sections = {'default': True} # maps section ID to true or false
        self.EmptyCell = nbformat.v4.nbbase.new_raw_cell("")

        super().__init__(**kw)

    def preprocess_cell(self, cell, resources, cell_index, store_history=True):
        """
        Executes a single code cell. See base.py for details.
        To execute all cells see :meth:`preprocess`.
        """

        if cell.cell_type not in ['code', 'markdown']:
            return cell, resources

        if cell.cell_type == 'code':
            # Do code stuff
            return self.preprocess_code_cell(cell, resources, cell_index, store_history)

        elif cell.cell_type == 'markdown':
            # Do markdown stuff
            cell, resources = self.preprocess_markdown_cell(cell, resources, cell_index, store_history)
            return cell, resources
        else:
            # Don't do anything
            return cell, resources

    def preprocess_code_cell(self, cell, resources, cell_index, store_history):
        """ Process code cell. Follow preprocess_cell from ExecutePreprocessor
        """
        self._check_assign_resources(resources)
        cell = self.execute_cell(cell, cell_index, store_history=True)
        return cell, self.resources

    def preprocess_markdown_cell(self, cell, resources, cell_index, store_history):
        # Find and execute snippets of code
        cell['metadata']['variables'] = {}
        for m in re.finditer("{{(.*?)}}", cell.source):
            # Execute code
            self.nb.cells.append(nbformat.v4.nbbase.new_code_cell(m.group(1)))
            fakecell, resources = self.preprocess_code_cell(self.nb.cells[-1], resources, len(self.nb.cells)-1, store_history)
            self.nb.cells.pop()
            # Output found in cell.outputs
            # Put output in cell['metadata']['variables']
            for output in fakecell.outputs:
                html = self.convert_output_to_html(output)
                if html is not None:
                    cell['metadata']['variables'][fakecell.source] = html
                    break
        return cell, resources

    def convert_output_to_html(self, output):
        """Convert IOpub output to HTML

        See https://github.com/ipython-contrib/IPython-notebook-extensions/blob/master/nbextensions/usability/python-markdown/main.js
        """
        if output['output_type'] == 'error':
            text = '**' + output.ename + '**: ' + output.evalue
            return text
        elif output.output_type == 'execute_result' or output.output_type == 'display_data':
            data = output.data
            if 'text/latex' in data:
                html = data['text/latex']
                return html
            elif 'image/svg+xml' in data:
                # Not supported
                #var svg = ul['image/svg+xml'];
                #/* embed SVG in an <img> tag, still get eaten by sanitizer... */
                #svg = btoa(svg);
                #html = '<img src="data:image/svg+xml;base64,' + svg + '"/>';
                return None
            elif 'image/jpeg' in data:
                jpeg = data['image/jpeg']
                html = '<img src="data:image/jpeg;base64,' + jpeg + '"/>'
                return html
            elif 'image/png' in data:
                png = data['image/png']
                html = '<img src="data:image/png;base64,' + png + '"/>'
                return html
            elif 'text/markdown' in data:
                text = data['text/markdown']
                return text
            elif 'text/html' in data:
                html = data['text/html']
                return html
            elif 'text/plain' in data:
                text = data['text/plain']
                # Strip <p> and </p> tags
                # Strip quotes
                # html.match(/<p>([\s\S]*?)<\/p>/)[1]
                text = re.sub(r'<p>([\s\S]*?)<\/p>', r'\1', text)
                text = re.sub(r"'([\s\S]*?)'",r'\1', text)
                return text
            else:
            # Some tag we don't support
                return None
        else:
            return None

Usage remains the same.

Update 2020-07-08

The answer provided by @gordon-bean was a life-saver for me. In my final round of googling before giving up I found this answer, so before I continue I just want to say thanks!

However, slightly over 4 years after the original answer jupyter / nbconvert went through some changes and the provided code needs to be updated. So here it is:

from nbconvert.preprocessors import ExecutePreprocessor
from nbconvert.preprocessors import execute
import nbformat
import re


# Taken from:
# https://stackoverflow.com/questions/35805121/execute-a-jupyter-notebook-including-inline-markdown-with-nbconvert
class ExecuteCodeMarkdownPreprocessor(ExecutePreprocessor):

    def __init__(self, **kw):
        self.sections = {'default': True} # maps section ID to true or false
        self.EmptyCell = nbformat.v4.nbbase.new_raw_cell("")

        super().__init__(**kw)

    def preprocess_cell(self, cell, resources, cell_index, store_history=True):
        """
        Executes a single code cell. See base.py for details.
        To execute all cells see :meth:`preprocess`.
        """

        if cell.cell_type not in ['code', 'markdown']:
            return cell, resources

        if cell.cell_type == 'code':
            # Do code stuff
            return self.preprocess_code_cell(cell, resources, cell_index, store_history)

        elif cell.cell_type == 'markdown':
            # Do markdown stuff
            return self.preprocess_markdown_cell(cell, resources, cell_index, store_history)
        else:
            # Don't do anything
            return cell, resources

    def preprocess_code_cell(self, cell, resources, cell_index, store_history):
        """ Process code cell.
        """
        # outputs = self.run_cell(cell)
        reply, outputs = self.run_cell(cell, cell_index, store_history)

        cell.outputs = outputs

        cell_allows_errors = (self.allow_errors or "raises-exception"
                              in cell.metadata.get("tags", []))

        if self.force_raise_errors or not cell_allows_errors:
            for out in cell.outputs:
                if out.output_type == 'error':
                    raise execute.CellExecutionError.from_cell_and_msg(cell, out)
            if (reply is not None) and reply['content']['status'] == 'error':
                raise execute.CellExecutionError.from_cell_and_msg(cell, reply['content'])

        return cell, resources

    def preprocess_markdown_cell(self, cell, resources, cell_index, store_history):
        # Find and execute snippets of code
        cell['metadata']['variables'] = {}
        for m in re.finditer("{{(.*?)}}", cell.source):
            # Execute code
            fakecell = nbformat.v4.nbbase.new_code_cell(m.group(1))
            fakecell, resources = self.preprocess_code_cell(fakecell, resources, cell_index, store_history)

            # Output found in cell.outputs
            # Put output in cell['metadata']['variables']
            for output in fakecell.outputs:
                html = self.convert_output_to_html(output)
                if html is not None:
                    cell['metadata']['variables'][fakecell.source] = html
                    break
        return cell, resources

    def convert_output_to_html(self, output):
        """Convert IOpub output to HTML

        See https://github.com/ipython-contrib/IPython-notebook-extensions/blob/master/nbextensions/usability/python-markdown/main.js
        """
        if output['output_type'] == 'error':
            text = '**' + output.ename + '**: ' + output.evalue
            return text
        elif output.output_type == 'execute_result' or output.output_type == 'display_data':
            data = output.data
            if 'text/latex' in data:
                html = data['text/latex']
                return html
            elif 'image/svg+xml' in data:
                # Not supported
                #var svg = ul['image/svg+xml'];
                #/* embed SVG in an <img> tag, still get eaten by sanitizer... */
                #svg = btoa(svg);
                #html = '<img src="data:image/svg+xml;base64,' + svg + '"/>';
                return None
            elif 'image/jpeg' in data:
                jpeg = data['image/jpeg']
                html = '<img src="data:image/jpeg;base64,' + jpeg + '"/>'
                return html
            elif 'image/png' in data:
                png = data['image/png']
                html = '<img src="data:image/png;base64,' + png + '"/>'
                return html
            elif 'text/markdown' in data:
                text = data['text/markdown']
                return text
            elif 'text/html' in data:
                html = data['text/html']
                return html
            elif 'text/plain' in data:
                text = data['text/plain']
                # Strip <p> and </p> tags
                # Strip quotes
                # html.match(/<p>([\s\S]*?)<\/p>/)[1]
                text = re.sub(r'<p>([\s\S]*?)<\/p>', r'\1', text)
                text = re.sub(r"'([\s\S]*?)'",r'\1', text)
                return text
            else:
            # Some tag we don't support
                return None
        else:
            return None

Usage of this code remains exactly the same as reported by Gordon Bean .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM