简体   繁体   中英

How to replace all dollar signs before all variables inside a double-quoted string with sed?

I have problems replacing the variables that are inside strings in bash. For example, I want to replace

"test$FOO1=$FOO2" $BAR

with:

"test" .. FOO1 .. "=" .. FOO2 .. "" $BAR

I tried:

sed 's/\$\([A-Z0-9_]\+\)\b/" .. \1 .. "/g'

But I don't want to replace variables the same way outside of double-quoted strings, eg like:

if [ $VARIABLE = 1 ]; then

Has to be replaced by just

if VARIABLE then

Is there a way to replace only inside of double-quotes?

Background:
I want to convert a bash script into Lua script .

I am aware, that it will not be easily possible to convert all possible shell scripts this way, but what I want to achieve is to replace all basic language constructs with Lua commands and replace all variables and conditionals. An automation here will save much work when translating bash into Lua by hand

This with GNU awk for multi-char RS, RT, and gensub() shows one way to separate and then manipulate quoted (in RT) and unquoted (in $0) strings as a starting point:

$ cat tst.awk
BEGIN { RS="\"[^\"]*\""; ORS="" }
{
    $0 = gensub(/\[\s+[$]([[:alnum:]_]+)\s+=\s+\S+\s+];/,"\\1","g",$0)
    RT = gensub(/[$]([[:alnum:]_]+)"/,"\" .. \\1","g",RT)
    RT = gensub(/[$]([[:alnum:]_]+)/,"\" .. \\1 .. \"","g",RT)
    print $0 RT
}

$ awk -f tst.awk file
"count: " .. FOO .. " times " .. BAR
if VARIABLE then

The above was run on this input file:

$ cat file
"count: $FOO times $BAR"
if [ $VARIABLE = 1 ]; then

NOTE: this approach of matching strings with regexps will always just be a best effort based on the samples provided, you'd need a shell language parser to do the job robustly.

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^"$]*"[^"]*)*"[^"$]*)\$([^" ]*) /\1" .. \3  .. " /;ta;s/^([^"]*("[^"$]*"[^"]*)*"[^"$]*)\$([^"]*)"/\1" .. \3/;ta' file

When changing things within double quotes, first we must sail passed any double quoted strings that do not need changing. This means anchoring the regexp to the start of the line using the ^ metacharacter and iterating the regexp until all cases cease to exist.

First, eliminate zero or more characters which are not double quotes from the start of the line.

Second, eliminate double quoted strings which do not contain the character of interest (TCOI) ie $ , followed by zero or more characters which are not double quotes, zero or more times.

Third, eliminate double quotes followed by zero or more characters which are not double quotes or TCOI ie $ .

The following character (if it exists) must be TCOI. Group the entire collection of strings before in a back reference \\1 .

Following TCOI, one or more conditions may be grouped. In the above example the first condition is when a variable (beginning with TCOI) is followed by a space. The second condition is when the variable is followed directly by " . Hence this entails two substitution commands, the ta command, branches to the loop identified a when the substitution was successful.

NB The if [ $VARIABLE = 1 ]; then if [ $VARIABLE = 1 ]; then situation can be treated in the same vien, here the [ is the opening double quote and the ] is the closing double quote.

PS TCOI was $ and this is also a metacharacter in regexp that represents the end of a line, it therefore must be quoted eg \\$

PPS Don't forget to quote the [ 's and ] 's too. If quotings not your thing, then enclose the character in [x] where x is the character to be quoted.

EDIT:

sed -E ':a;s/^([^"]*("[^"$]*"[^"]*)*"[^"$]*)\$([[:alnum:]]*)/\1" .. \3  .. "/;ta' file

Since the original example has been replace by the OP here is a solution based on the new example.

bash lexer for shell!?

I'm so sorry: I just post this answer to warn you about a wrong way!

Reading language is a job for a consistant lexer not for sed nor any regex based tool!!!

See GNU Bison , Berkeley Yacc (byacc) .

You could have a look at 's sources in order to see how scripts are read!

Persisting in this way will bring you quickly to big script, then further to unsolvable problems.

using group and recursive

sed -e ':a' -e 's/^\(\([^"]*\("[^"]*"\)*\)*\)\("[^$"]*\)[$]\([A-Z0-9_]\{1,\}\)/\1\4 .. \5 .. /;t a'
  1. isolate in string from previous part with ^\\(\\([^"]*\\("[^"]*"\\)*\\)*\\) in group 1
  2. select the var content in the string isolated with s\\("[^$"]*\\)[$]\\([A-Z0-9_]\\{1,\\}\\)' in group 4 (prefix) and 5 (var name)
  3. change like you want with \\1\\4 .. \\5 ..
  4. repeat this operation while a change is occuring :a and ta

with a gnu sed you can reduce the command to (no -e needed to target the label a):

sed ':a;s/^\(\([^"]*\("[^"]*"\)*\)*\)\("[^$"]*\)[$]\([A-Z0-9_]\{1,\}\)/\1\4 .. \5 .. /;t a'

Assuming there is no quote (escaped one) in string. If so a first pass is needed to change them and put them back after main modification.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM