繁体   English   中英

通过 Panflute 的 Pandoc 过滤器未按预期工作

[英]Pandoc Filter via Panflute not Working as Expected

问题

对于 Markdown 文档,我想过滤掉所有 header 标题不在to_keep列表中的部分。 一节由 header 和正文组成,直到下一节或文档结尾。 为简单起见,我们假设文档只有 1 级标题。

当我对当前元素是否在 to_keep 中的to_keep进行简单区分时,要么return None要么return []我得到一个错误。 也就是说,对于pandoc --filter filter.py -o output.pdf input.md我得到TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list" (代码,示例文件和完整最后的错误信息)。

我使用 Python 3.7.4 和 panflute 1.12.5 和 pandoc 2.2.3.2。

问题

如果对何时执行return []进行更细粒度的区分,它会起作用(函数action_working )。 我的问题是,为什么需要这种更细粒度的区别? 我的解决方案似乎有效,但很可能是偶然的......我怎样才能让它正常工作?

文件

错误

Traceback (most recent call last):
  File "filter.py", line 42, in <module>
    main()
  File "filter.py", line 39, in main
    return run_filter(action_not_working, doc=doc)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
    return run_filters([action], *args, **kwargs)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
    dump(doc, output_stream=output_stream)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
    raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1

输入.md

# English 
Some cool english text this is!

# Deutsch 
Dies ist die deutsche Übersetzung!

# Sources
Some source.

# Priority
**Medium** *[Low | Medium | High]*

# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*

# Interested Persons (mailing list)
- Franz, Heinz, Karl

优化器.py

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

def action_not_working(elem, doc):
    '''For every element we check if it occurs in a section we wish to keep. 
    If it is, we keep it and return None (indicating to keep the element unchanged).
    Otherwise we remove the element (return []).'''
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        return []

def action_working(elem, doc):
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        if isinstance(elem, Header):
            return []
        elif isinstance(elem, Para):
            return []
        elif isinstance(elem, BulletList):
            return []

def update_keep(elem):
    '''if the element is a header we update to_keep.'''
    global to_keep, keep_current
    if isinstance(elem, Header):
        # Keep if the title of a section is in too keep
        keep_current = stringify(elem) in to_keep


def main(doc=None):
    return run_filter(action_not_working, doc=doc) 

if __name__ == '__main__':
    main()

我认为发生的事情是 panflute 调用所有元素的操作,包括Doc根元素。 如果在遍历Doc元素时keep_currentFalse ,它将被替换为列表。 这会导致您看到的错误消息,因为 panflute 期望根节点始终存在。

更新后的过滤器仅作用于HeaderParaBulletList元素,因此Doc根节点将保持不变。 您可能希望使用更通用的东西,例如isinstance(elem, Block)


另一种方法是直接使用 panflute 的loaddump元素:将文档加载到Doc元素中,手动迭代args中的所有块并删除所有不需要的内容,然后将生成的文档转储回 output stream。

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

doc = load()
for top_level_block in doc.args:
    # do things, remove unwanted blocks

dump(doc)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM