简体   繁体   English

将sed单线转换为awk

[英]Translating a sed one-liner into awk

I am parsing files containing lines of "key=value" pairs. 我正在解析包含“键=值”对行的文件。 An example could be this: 一个例子可能是这样的:

Normal line
Another normal line
[PREFIX] 1=Something 5=SomethingElse 26=42
Normal line again

I'd like to leave all lines not containing key=value pairs as they are, while transforming all lines containing key=value pairs as follows: 我想保留所有不包含键值对的行,而按如下方式转换所有包含键值对的行:

Normal line
Another normal line
[PREFIX]
  AAA=Something
  EEE=SomethingElse
  ZZZ=42
Normal line again

Assume I have a valid dictionary for the translation. 假设我有一个有效的翻译词典。

What I do at the moment is passing the input to sed, where I turn spaces into newlines for the lines that match '^\\[' . 我现在要做的是将输入传递给sed,在这里我将空格转换为与'^\\['匹配'^\\['行的换行符。

The output is then piped into this awk script: 然后将输出通过管道传递到以下awk脚本中:

BEGIN {
    dict[1] = "AAA"
    dict[5] = "EEE"
    dict[26] = "ZZZ"

    FS="="
}   
{
    if (match($0, "[0-9]+=.+")) {
        key = ""
        if ($1 in dict) {
            key = dict[$1]
        }
        printf("%7s = %s\n", key, $2)
    }   
    else {
        print
        next
    }   
}   

The overall command line then becomes: 整个命令行将变为:

cat input | sed '/^\(\[.*\)/s/ /\n/g' | awk -f script.awk

My question is: is there any way I can include the sed operation in the middle so to get rid of that additional step? 我的问题是:有什么办法可以在中间包含sed操作,从而摆脱该额外的步骤?

$ cat tst.awk
BEGIN {
    split("1 AAA 5 EEE 26 ZZZ",tmp)
    for (i=1; i in tmp; i+=2) {
        dict[tmp[i]] = tmp[i+1]
    }
    FS="[ =]"
    OFS="="
}
$1 == "[PREFIX]" {
    print $1
    for (i=2; i<NF; i+=2) {
        print "  " ($i in dict ? dict[$i] : $i), $(i+1)
    }
    next
}
{ print }

$ awk -f tst.awk file
Normal line
Another normal line
[PREFIX]
  AAA=Something
  EEE=SomethingElse
  ZZZ=42
Normal line again

In fact I could not force awk to read the file twice; 实际上,我不能强迫awk两次读取文件。 one for sed command, one for your algo, so I had to modify your algo. 一个用于sed命令,一个用于您的算法,因此我不得不修改您的算法。

BEGIN {
    dict[1] = "AAA"
    dict[5] = "EEE"
    dict[26] = "ZZZ"

#    FS="="
}   
$0 !~/[0-9]+=.+/ { print }
/[0-9]+=.+/ {
   nb = split($0,arr1);
   for (i=1; i<=nb; i++ in arr1)  {
      nbb = split(arr1[i], keyVal, "=");
      if ( (nbb==2) && (keyVal[1] in dict) ) {
         printf("%7s = %s\n", dict[keyVal[1]], keyVal[2])
      } 
      else
         print arr1[i];
   }
}   

When you have to convert a lot, you can first migrate your dict file into a sed script file. 当您需要进行大量转换时,可以先将dict文件迁移到sed脚本文件中。 When your dicht file has a fixed format, you can convert it on the fly. 当您的dicht文件具有固定格式时,您可以即时对其进行转换。

Suppose your dict file looks like 假设你的dict文件看起来像

1=AAA
5=EEE
26=ZZZ

And your input file is 您的输入文件是

Normal line
Another normal line
[PREFIX] 1=Something 5=SomethingElse 26=42
Normal line again

You want to do something like 你想做类似的事情

cat input | sed '/^\[/ s/ /\n/g' | sed 's/^1=/  AAA=/'
# Or eliminating the extra step with cat
sed '/^\[/ s/ /\n/g' input | sed 's/^1=/  AAA=/'

So your next step is converting your dict file into sed commands: 因此,下一步是将dict文件转换为sed命令:

sed 's#\([^=]*\)=\(.*\)#s/^\1=/   \2=/#' dictfile

Now you can combine these with 现在您可以将它们与

sed '/^\[/ s/ /\n/g' input | sed -f <(
   sed 's#\([^=]*\)=\(.*\)#s/^\1=/   \2=/#' dictfile
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM