[英]Jupyter (IPython) notebook: Convert an HTML notebook to ipynb
I have converted a Jupyter/IPython notebook to HTML format and subsequently lost the original ipynb file.我已将 Jupyter/IPython 笔记本转换为 HTML 格式,随后丢失了原始 ipynb 文件。
Is there a simple way to generate the original notebook file from the converted HTML file?有没有一种简单的方法可以从转换后的 HTML 文件生成原始笔记本文件?
I recently used BeautifulSoup and JSON to convert html notebook to ipynb.我最近使用 BeautifulSoup 和 JSON 将 html notebook 转换为 ipynb。 the trick is to look at the JSON schema of a notebook and emulate that.
诀窍是查看笔记本的 JSON 模式并进行模拟。 The code selects only input code cells and markdown cells
该代码仅选择输入代码单元格和降价单元格
here is my code这是我的代码
from bs4 import BeautifulSoup
import json
import urllib.request
url = 'http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb'
response = urllib.request.urlopen(url)
# for local html file
# response = open("/Users/note/jupyter/notebook.html")
text = response.read()
soup = BeautifulSoup(text, 'lxml')
# see some of the html
print(soup.div)
dictionary = {'nbformat': 4, 'nbformat_minor': 1, 'cells': [], 'metadata': {}}
for d in soup.findAll("div"):
if 'class' in d.attrs.keys():
for clas in d.attrs["class"]:
if clas in ["text_cell_render", "input_area"]:
# code cell
if clas == "input_area":
cell = {}
cell['metadata'] = {}
cell['outputs'] = []
cell['source'] = [d.get_text()]
cell['execution_count'] = None
cell['cell_type'] = 'code'
dictionary['cells'].append(cell)
else:
cell = {}
cell['metadata'] = {}
cell['source'] = [d.decode_contents()]
cell['cell_type'] = 'markdown'
dictionary['cells'].append(cell)
open('notebook.ipynb', 'w').write(json.dumps(dictionary))
here is part of print(soup.div)
output这是
print(soup.div)
输出的一部分
div class="container">
<div class="navbar-header">
<button class="navbar-toggle collapsed" data-target=".navbar-collapse" data-toggle="collapse" type="button">
<span class="sr-only">Toggle navigation</span>
<i class="fa fa-bars"></i>
</button>
<a class="navbar-brand" href="/">
<img src="/static/img/nav_logo.svg?v=479cefe8d932fb14a67b93911b97d70f" width="159"/>
</a>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li>
<a class="active" href="http://jupyter.org">JUPYTER</a>
</li>
<li>
<a href="/faq" title="FAQ">
<span>FAQ</span>
A screen shot of the resulting ipynb file, loaded on my local jupyter and after running all the cells生成的 ipynb 文件的屏幕截图,加载到我的本地 jupyter 并运行所有单元格后
Here's a trick: Save the html file as a .txt file and then open it in your code editor.这里有一个技巧:将 html 文件保存为 .txt 文件,然后在代码编辑器中打开它。 Then rename the file extension as .ipynb That should do the trick.
然后将文件扩展名重命名为 .ipynb 这应该可以解决问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.