简体   繁体   中英

Concatenating string fragments in Pandoc lua filters

I'm trying to create a pandoc filter that will help me summarize data. I've seen some filters that create table of contents, but I'd like to organize the index based on content found within headers.

For instance, below I'd like to provide a summary of content based on tagged dates in headers (some headers will not contain dates...)

[nwatkins@sapporo foo]$ cat test.md
# 1 May 2018
some info

# not a date
some data

# 2 May 2018
some more info

I started off by trying to look at the content of the headers. The intention was to just apply a simple regex for different date/time patterns.

[nwatkins@sapporo foo]$ cat test.lua
function Header(el)
  return pandoc.walk_block(el, {
    Str = function(el)
      print(el.text)
    end })
end

Unfortunately, this seems to apply the print state for each space-separated string, rather than a concatenation allowing me to analyze an entire header content:

[nwatkins@sapporo foo]$ pandoc --lua-filter test.lua test.md
1
May
2018
not
...

Is there a canonical way to do this in filters? I have yet to see any helper function in the Lua filters documentation.

Update : the dev version now provides the new functions pandoc.utils.stringify and pandoc.utils.normalize_date . They will become part of the next pandoc release (probably 2.0.6). With these, you can test whether a header contains a date with the following code:

function Header (el)
  content_str = pandoc.utils.stringify(el.content)
  if pandoc.utils.normalize_date(content_str) ~= nil then
    print 'header contains a date'
  else
    print 'not a date'
  end
end

There is no helper function yet, but we have plans to provide a pandoc.utils.tostring function in the very near future.

In the meantime, the following snippet (taken from this discussion ) should help you to get what you need:

--- convert a list of Inline elements to a string.
function inlines_tostring (inlines)
  local strs = {}
  for i = 1, #inlines do
    strs[i] = tostring(inlines[i])
  end
  return table.concat(strs)
end

-- Add a `__tostring` method to all Inline elements. Linebreaks
-- are converted to spaces.
for k, v in pairs(pandoc.Inline.constructor) do
  v.__tostring = function (inln)
    return ((inln.content and inlines_tostring(inln.content))
        or (inln.caption and inlines_tostring(inln.caption))
        or (inln.text and inln.text)
        or " ")
  end
end

function Header (el)
  header_text = inlines_tostring(el.content)
end 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM