简体   繁体   English

用sed提取一行中的特定字符串

[英]Extracting specific string in a line with sed

In a line of string like this: 在这样的字符串行中:

"<testcase_name> +config=<main_cfg> +cfg1=<cfg1> +cfg2=<cfg2> +testlength=<string> +print_cfg=<string> +print_match=<string> +quit_count=<string>"

I would like to extract those strings that are after +cfg1= and +cfg2= only, and not including these specific configurations: +config= , +testlength= , +print_match= , +print_cfg= , and +quit_count= . 我想提取仅在+cfg1= +cfg2= +cfg1=+cfg2=之后的那些字符串,并且不包括以下特定配置: +config=+testlength=+print_match=+print_cfg=+quit_count=

So I would like to store the result in a variable and be able to view it as: 所以我想将结果存储在变量中,并能够将其查看为:

echo "other_cfg = $other_cfg"
% other_cfg = cfg1.cfg2

Notice a . 注意一个. separates cfg1 and cfg2 strings. 分离cfg1cfg2字符串。 Is there a single line (if possible in sed) that can do this? 有没有一行(如果可能的话,在sed中)可以做到这一点?

More conditions: 更多条件:

  1. +cfg1 and +cfg2 could be any string AND there could be more of them. +cfg1 +cfg2+cfg2可以是任何字符串,并且可以更多。 So the key here is to just not include these known configs: testlength , config , print_match , print_cfg , and quit_count . 因此,这里的关键是仅不包括以下已知配置: testlengthconfigprint_matchprint_cfgquit_count
  2. The configs are not always in that order as the example above, except for +config= which is always the first one. 除了+config=始终是第一个+config=之外, +config=并不总是按照上面的示例那样进行。
  3. Any of the known configs mentioned in (1) may not be present in the line. 该行中可能不存在(1)中提到的任何已知配置。

Examples: 例子:

Input 1: 输入1:

testA +config=reg +input=walk1s +print_match=1 +testlength=short

Expected output 1: 预期输出1:

% other_cfg = walk1s

Input 2: 输入2:

testA +config=mem +quit_count=50 +order=reverse +input=rand +testlength=long

Expected output 2: 预期输出2:

% other_cfg = reverse.rand

sed is for simple substitutions on individual lines, that is all. sed用于单行替换,仅此而已。 That's not what this problem is so it's not a job for sed, it's a job for awk. 这不是问题所在,因此它不是sed的工作,而是awk的工作。

$ cat file
"<testcase_name> +config=<main_cfg> +cfg1=<cfg1> +cfg2=<cfg2> +testlength=<string> +print_cfg=<string> +print_match=<string> +quit_count=<string>"
testA +config=reg +input=walk1s +print_match=1 +testlength=short
testA +config=mem +quit_count=50 +order=reverse +input=rand +testlength=long

$ cat tst.awk
BEGIN{
    FS="[ =]+"
    split("config testlength print_match print_cfg quit_count",tmp)
    for (i in tmp) {
        skip_cfgs["+"tmp[i]]
    }
}
{
    other_cfg = ""
    for (i=2;i<=NF;i+=2) {
        if ( !($i in skip_cfgs) ) {
            other_cfg = (other_cfg=="" ? "" : other_cfg ".") $(i+1)
        }
    }
    print "% other_cfg =", other_cfg
}

$ awk -f tst.awk file
% other_cfg = <cfg1>.<cfg2>
% other_cfg = walk1s
% other_cfg = reverse.rand

If +cfg1 and +cfg2 are guaranteed to always been in that order in the line, then yes; 如果+cfg1+cfg2 ,保证始终在该行的顺序,那么是的; just subst out everything else: 消除所有其他内容:

sed 's/.*+cfg1=<//;s/>.*+cfg2=<//;s/>.*//'

Otherwise, you'll want at least awk . 否则,您至少需要awk

This might work for you (GNU sed): 这可能对您有用(GNU sed):

sed -r 's/^\S+\s+(.*)$/other_cfg = \1\n+config=+testlength=+print_match=+print_cfg=+quit_count=/;:a;s/ (\+\S+=)\S+(.*\n.*)\1/\2/;ta;s/ \+[^=]*=/./g;s/\./ /;P;d' file

This replaces the first string by the required one and adds a lookup table to the pattern space which includes all the config strings not needed. 这将用所需的字符串替换第一个字符串,并向模式空间添加一个查找表,其中包括所有不需要的配置字符串。 The unwanted config strings are then iteratively removed and remaining configs morphed into the required output. 然后,反复删除不需要的配置字符串,并将其余配置变形为所需的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM