I'm trying to create a pandoc filter that will help me summarize data. I've seen some filters that create table of contents, but I'd like to organize the index based on content found within headers.
For instance, below I'd like to provide a summary of content based on tagged dates in headers (some headers will not contain dates...)
[nwatkins@sapporo foo]$ cat test.md
# 1 May 2018
some info
# not a date
some data
# 2 May 2018
some more info
I started off by trying to look at the content of the headers. The intention was to just apply a simple regex for different date/time patterns.
[nwatkins@sapporo foo]$ cat test.lua
function Header(el)
return pandoc.walk_block(el, {
Str = function(el)
print(el.text)
end })
end
Unfortunately, this seems to apply the print state for each space-separated string, rather than a concatenation allowing me to analyze an entire header content:
[nwatkins@sapporo foo]$ pandoc --lua-filter test.lua test.md
1
May
2018
not
...
Is there a canonical way to do this in filters? I have yet to see any helper function in the Lua filters documentation.
Update : the dev version now provides the new functions pandoc.utils.stringify
and pandoc.utils.normalize_date
. They will become part of the next pandoc release (probably 2.0.6). With these, you can test whether a header contains a date with the following code:
function Header (el)
content_str = pandoc.utils.stringify(el.content)
if pandoc.utils.normalize_date(content_str) ~= nil then
print 'header contains a date'
else
print 'not a date'
end
end
There is no helper function yet, but we have plans to provide a pandoc.utils.tostring
function in the very near future.
In the meantime, the following snippet (taken from this discussion ) should help you to get what you need:
--- convert a list of Inline elements to a string.
function inlines_tostring (inlines)
local strs = {}
for i = 1, #inlines do
strs[i] = tostring(inlines[i])
end
return table.concat(strs)
end
-- Add a `__tostring` method to all Inline elements. Linebreaks
-- are converted to spaces.
for k, v in pairs(pandoc.Inline.constructor) do
v.__tostring = function (inln)
return ((inln.content and inlines_tostring(inln.content))
or (inln.caption and inlines_tostring(inln.caption))
or (inln.text and inln.text)
or " ")
end
end
function Header (el)
header_text = inlines_tostring(el.content)
end
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.