简体   繁体   中英

Execute command defined by backreference in sed

I am creating a primitive experimental templating engine completely based on sed (merely for my private enjoyment). One thing I have been trying to achieve for several hours now is to replace certain text patterns with the output of a command they contain.

To clearify, if an input line looks like this

Lorem {{echo ipsum}}

I would look the sed output to look like this:

Lorem ipsum

The closest I have come is this:

echo 'Lorem {{echo ipsum}}' | sed 's/{{\(.*\)}}/'"$(\\1)"'/g'

which does not work.

However,

echo 'Lorem {{echo ipsum}}' | sed 's/{{\(.*\)}}/'"$(echo \\1)"'/g'

gives me

Lorem echo ipsum

I don't quite understand what is happening here. Why can I give the backreference to the echo command, but cannot evaluate the entire backreference in $()? When is \\\\1 getting evaluated? Is the thing I am trying to achieve even possible with pure sed?

Keep in mind that it is entirely clear to me that what I am trying to achieve is easily possible with other tools. However, I am highly interested in whether this is possible with pure sed.

Thanks!

The reason your attempt doesn't work is that $() is expanded by the shell before sed is even called. For this reason it can't use the backreferences sed is eventually going to capture.

It is possible to do this sort of thing with GNU sed (not with POSIX sed). The main trick is that GNU sed has a e flag to the s command that makes it replace the pattern space (the whole space) with the result of the pattern space executed as a shell command. What this means is that

echo 'echo foo' | sed 's/f/g/e'

prints goo .

This can be used for your use case as follows:

echo 'Lorem {{echo ipsum}}' | sed ':a /\(.*\){{\(.*\)}}\(.*\)/ { h; s//\1\n\3/; x; s//\2/e; G; s/\(.*\)\n\(.*\)\n\(.*\)/\2\1\3/; ba }'

The sed code works as follows:

:a                                    # jump label for looping, in case there are
                                      # several {{}} expressions in a line
/\(.*\){{\(.*\)}}\(.*\)/ {            # if there is a {{}} expression,
  h                                   # make a copy of the line
  s//\1\n\3/                          # isolate the surrounding parts
  x                                   # swap the original back in
  s//\2/e                             # isolate the command, execute, get output
  G                                   # get the outer parts we put into the hold
                                      # buffer
  s/\(.*\)\n\(.*\)\n\(.*\)/\2\1\3/    # rearrange the parts to put the command
                                      # output into the right place
  ba                                  # rinse, repeat until all {{}} are covered
}

This makes use of sed 's greedy matching in the regexes to always capture the last {{}} expression in a line. Note that it will have difficulties if there are several commands in a line and one of the later ones has multi-line output. Handling this case will require the definition of a marker that the commands embedded in the data are not allowed to have as part of their output and that the templates are not allowed to contain. I would suggest something like {{{}}} , which would lead to

sed ':a /\(.*\){{\(.*\)}}\(.*\)/ { h; s//{{{}}}\1{{{}}}\3/; x; s//\2/e; G; s/\(.*\)\n{{{}}}\(.*\){{{}}}\(.*\)/\2\1\3/; ba }'

The reasoning behind this is that the template engine would run into trouble anyway if the embedded commands printed further {{}} terms. This convention is impossible to enforce, but then any code you pass into this template engine had better come from a trusted source, anyway.

Mind you, I am not sure that this whole thing is a sane idea 1 . You're not planning to use it in any sort of production code, are you?

1 I am, however, quite sure whether it is a sane idea.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM