通过 Panflute 的 Pandoc 过滤器未按预期工作

Question

问题

对于 Markdown 文档，我想过滤掉所有 header 标题不在to_keep列表中的部分。 一节由 header 和正文组成，直到下一节或文档结尾。 为简单起见，我们假设文档只有 1 级标题。

当我对当前元素是否在 to_keep 中的to_keep进行简单区分时，要么return None要么return []我得到一个错误。 也就是说，对于pandoc --filter filter.py -o output.pdf input.md我得到TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list" （代码，示例文件和完整最后的错误信息）。

我使用 Python 3.7.4 和 panflute 1.12.5 和 pandoc 2.2.3.2。

问题

如果对何时执行return []进行更细粒度的区分，它会起作用（函数action_working ）。 我的问题是，为什么需要这种更细粒度的区别？ 我的解决方案似乎有效，但很可能是偶然的......我怎样才能让它正常工作？

文件

错误

Traceback (most recent call last):
  File "filter.py", line 42, in <module>
    main()
  File "filter.py", line 39, in main
    return run_filter(action_not_working, doc=doc)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
    return run_filters([action], *args, **kwargs)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
    dump(doc, output_stream=output_stream)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
    raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1

输入.md

# English 
Some cool english text this is!

# Deutsch 
Dies ist die deutsche Übersetzung!

# Sources
Some source.

# Priority
**Medium** *[Low | Medium | High]*

# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*

# Interested Persons (mailing list)
- Franz, Heinz, Karl

优化器.py

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

def action_not_working(elem, doc):
    '''For every element we check if it occurs in a section we wish to keep. 
    If it is, we keep it and return None (indicating to keep the element unchanged).
    Otherwise we remove the element (return []).'''
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        return []

def action_working(elem, doc):
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        if isinstance(elem, Header):
            return []
        elif isinstance(elem, Para):
            return []
        elif isinstance(elem, BulletList):
            return []

def update_keep(elem):
    '''if the element is a header we update to_keep.'''
    global to_keep, keep_current
    if isinstance(elem, Header):
        # Keep if the title of a section is in too keep
        keep_current = stringify(elem) in to_keep


def main(doc=None):
    return run_filter(action_not_working, doc=doc) 

if __name__ == '__main__':
    main()

Answer 1

我认为发生的事情是 panflute 调用所有元素的操作，包括Doc根元素。 如果在遍历Doc元素时keep_current为False ，它将被替换为列表。 这会导致您看到的错误消息，因为 panflute 期望根节点始终存在。

更新后的过滤器仅作用于Header 、 Para和BulletList元素，因此Doc根节点将保持不变。 您可能希望使用更通用的东西，例如isinstance(elem, Block) 。

另一种方法是直接使用 panflute 的load和dump元素：将文档加载到Doc元素中，手动迭代args中的所有块并删除所有不需要的内容，然后将生成的文档转储回 output stream。

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

doc = load()
for top_level_block in doc.args:
    # do things, remove unwanted blocks

dump(doc)

通过 Panflute 的 Pandoc 过滤器未按预期工作

问题描述

问题

问题

文件

错误

输入.md

优化器.py

1 个解决方案

解决方案1
0 已采纳 2020-07-17 06:52:15

通过 Panflute 的 Pandoc 过滤器未按预期工作

问题描述

问题

问题

文件

错误

输入.md

优化器.py

1 个解决方案

解决方案1 0 已采纳 2020-07-17 06:52:15

解决方案1
0 已采纳 2020-07-17 06:52:15