简体   繁体   English

如何使用nbconvert+pandoc渲染pdf中的pd.DataFrame表

[英]How to render pd.DataFrame table in pdf with nbconvert+pandoc

I am generating a pdf from a set of Jupyter notebooks.我正在从一组 Jupyter 笔记本生成一个 pdf。 For each.ipynb file, I'm running对于每个 .ipynb 文件,我正在运行

$ jupyter-nbconvert --to markdown Untitled1.ipynb

and then merging them together with:然后将它们合并在一起:

$ pandoc Untitled1.md [Untitled2.md...] -f gfm --pdf-engine=pdflatex -o all_notebooks.pdf

(I am mostly following the example here .) One thing I noticed is that the pandas DataFrames, eg (我主要是按照这里的例子。)我注意到的一件事是 pandas 数据帧,例如

import pandas as pd
df = pd.DataFrame({'a':[1,2,3]})
df.head()

are rendered in the pdf as在 pdf 中呈现为

pdf格式的数据框

rather than而不是

在此处输入图像描述

Any idea how to fix this issue, please?知道如何解决这个问题吗? I am using $ jupyter-nbconvert --version 5.6.1 and $ pandoc --version 2.9.2.1 .我正在使用$ jupyter-nbconvert --version 5.6.1$ pandoc --version 2.9.2.1 In the md file the table turns into the html block below.md文件中,表格变成下面的 html 块。 I suspect pandoc does not interpret it correctly.我怀疑 pandoc 没有正确解释它。 I tried the from-markdown-strict option suggested here , without any luck.我尝试了此处建议的from-markdown-strict选项,但没有任何运气。

Thank you!谢谢!

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>a</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
    </tr>
  </tbody>
</table>
</div>

The issue here is that nbconvert sees the DataFrames as HTML (plus the styling, which you're seeing in the output, issue here ), which gets ignored by pandoc's Markdown converter.这里的问题是 nbconvert 将 DataFrames 视为 HTML (加上您在 output 中看到的样式,此处发布),它被 pandoc 的 Z2182A74BAB718038D959E795D 转换器忽略。

One way around this is to change pandas' behavior to not write out DataFrames as HTML in notebooks.解决此问题的一种方法是更改 pandas 的行为,使其不在笔记本中将 DataFrames 写为 HTML。 You can do this by setting the option at the top of each notebook:您可以通过在每个笔记本顶部设置选项来执行此操作:

pd.set_option("display.notebook_repr_html", False)

Another option is to use the HTML representation as the intermediate step rather than Markdown:另一种选择是使用 HTML 表示作为中间步骤,而不是 Markdown:

$ jupyter-nbconvert --to html Untitled1.ipynb
$ pandoc Untitled1.html -t latex --pdf-engine=pdflatex -o all_notebooks.pdf

And of course if you don't need to do other formatting, you can just save your notebooks directly as pdfs:当然,如果您不需要进行其他格式化,您可以直接将笔记本保存为 pdf:

jupyter-nbconvert --to pdf Untitled1.ipynb

(To combine multiple notebooks, see the discussion here .) (要组合多个笔记本,请参阅此处的讨论。)

The problem seems to be in the connection between Jupyter and Pandoc.问题似乎在于 Jupyter 和 Pandoc 之间的联系。 Jupyter didn't output formatted markdown and hence pandoc doesn't format it in the PDF. Jupyter 没有将 output 格式化为 markdown,因此 pandoc 没有在 PDF 中对其进行格式化。

For me the best way is using ipypublish ( https://ipypublish.readthedocs.io/en/latest/ )对我来说,最好的方法是使用 ipypublish ( https://ipypublish.readthedocs.io/en/latest/ )

Install安装

conda install -c conda-forge ipypublish

Setup pandas设置 pandas

from ipypublish import nb_setup
pd = nb_setup.setup_pandas(escape_latex = False)
...
pd.DataFrame(mydata)

Profit利润

jupyter nbconvert notebook.ipynb --no-input --no-prompt --to pdf

Make sure you run the notebook again before converting it, such that all the tables are rendered with ipypublish.确保在转换之前再次运行笔记本,以便所有表格都使用 ipypublish 呈现。 Then they look cool in the notebook as well as in the PDF.然后它们在笔记本和 PDF 中看起来很酷。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM