[英]Pandoc Filter via Panflute not Working as Expected
对于 Markdown 文档,我想过滤掉所有 header 标题不在to_keep
列表中的部分。 一节由 header 和正文组成,直到下一节或文档结尾。 为简单起见,我们假设文档只有 1 级标题。
当我对当前元素是否在 to_keep 中的to_keep
进行简单区分时,要么return None
要么return []
我得到一个错误。 也就是说,对于pandoc --filter filter.py -o output.pdf input.md
我得到TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
(代码,示例文件和完整最后的错误信息)。
我使用 Python 3.7.4 和 panflute 1.12.5 和 pandoc 2.2.3.2。
如果对何时执行return []
进行更细粒度的区分,它会起作用(函数action_working
)。 我的问题是,为什么需要这种更细粒度的区别? 我的解决方案似乎有效,但很可能是偶然的......我怎样才能让它正常工作?
Traceback (most recent call last):
File "filter.py", line 42, in <module>
main()
File "filter.py", line 39, in main
return run_filter(action_not_working, doc=doc)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
return run_filters([action], *args, **kwargs)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
dump(doc, output_stream=output_stream)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1
# English
Some cool english text this is!
# Deutsch
Dies ist die deutsche Übersetzung!
# Sources
Some source.
# Priority
**Medium** *[Low | Medium | High]*
# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*
# Interested Persons (mailing list)
- Franz, Heinz, Karl
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
def action_not_working(elem, doc):
'''For every element we check if it occurs in a section we wish to keep.
If it is, we keep it and return None (indicating to keep the element unchanged).
Otherwise we remove the element (return []).'''
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
return []
def action_working(elem, doc):
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
if isinstance(elem, Header):
return []
elif isinstance(elem, Para):
return []
elif isinstance(elem, BulletList):
return []
def update_keep(elem):
'''if the element is a header we update to_keep.'''
global to_keep, keep_current
if isinstance(elem, Header):
# Keep if the title of a section is in too keep
keep_current = stringify(elem) in to_keep
def main(doc=None):
return run_filter(action_not_working, doc=doc)
if __name__ == '__main__':
main()
我认为发生的事情是 panflute 调用所有元素的操作,包括Doc
根元素。 如果在遍历Doc
元素时keep_current
为False
,它将被替换为列表。 这会导致您看到的错误消息,因为 panflute 期望根节点始终存在。
更新后的过滤器仅作用于Header
、 Para
和BulletList
元素,因此Doc
根节点将保持不变。 您可能希望使用更通用的东西,例如isinstance(elem, Block)
。
另一种方法是直接使用 panflute 的load
和dump
元素:将文档加载到Doc
元素中,手动迭代args
中的所有块并删除所有不需要的内容,然后将生成的文档转储回 output stream。
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
doc = load()
for top_level_block in doc.args:
# do things, remove unwanted blocks
dump(doc)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.