简体   繁体   中英

How to follow DRY principles in jupyter python notebook

Jupyter is a notebook, a web application in which people can write documents and execute code in many languages. For the purpose of this question, let's stick to just python.

I often found myself duplicating code across many jupyter files, mostly evaluation code across prediction models across data science projects. How can I don't repeat myself in jupyter python notebooks?

Create a module with the common code and import it into the notebooks that need it. Jupyter can export a Python notebook as a Python module. File > Download as > Python (.py).

Put that file on your kernel's path and then you can import it.

With a Python kernel, you can see your path with

import sys
sys.path

If you put the module in any of those directories, you can import it. You can also append a new path string, either using sys.path.append(...) (just for that session) or by updating the relevant environment variable in or operating system (usually PYTHONPATH ).


It is possible to import an .ipynb, but it's not as straightforward as converting it first. IPython cells can use extended syntax like the %-magics etc. IPython includes all the tools necessary to load an .ipynb file programmatically (See nbformat.read() and IPython.core.interactiveshell.InteractiveShell ), then it's just a matter of using the standard library import hooks.

That's a little too involved to reproduce here, but the Jupyter documentation explains how to do it.


Is there a way to import only certain cells from another notebook? I can't seem to find it.

Yes, it should be be possible to import individual cells, though beware that a cell may not work right if it assumes a cell that you did not import has already run.

The relevant snippet from the linked docs is

     for cell in nb.cells:
        if cell.cell_type == 'code':
            # transform the input to executable Python
            code = self.shell.input_transformer_manager.transform_cell(cell.source)
            # run the code in themodule
            exec(code, mod.__dict__)

Notice how this just runs each code-type cell in turn using a loop. You could run a particular cell (or cells) here instead of all of them. Then the imported module would have only run that code.

The tricky part might be identifying the cell you want. The obvious approach is to count, but you could also mark the cell somehow, eg in the loop, only run a cell if the cell.source starts with a certain comment or something.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM