简体   繁体   中英

Extracting specific string in a line with sed

In a line of string like this:

"<testcase_name> +config=<main_cfg> +cfg1=<cfg1> +cfg2=<cfg2> +testlength=<string> +print_cfg=<string> +print_match=<string> +quit_count=<string>"

I would like to extract those strings that are after +cfg1= and +cfg2= only, and not including these specific configurations: +config= , +testlength= , +print_match= , +print_cfg= , and +quit_count= .

So I would like to store the result in a variable and be able to view it as:

echo "other_cfg = $other_cfg"
% other_cfg = cfg1.cfg2

Notice a . separates cfg1 and cfg2 strings. Is there a single line (if possible in sed) that can do this?

More conditions:

  1. +cfg1 and +cfg2 could be any string AND there could be more of them. So the key here is to just not include these known configs: testlength , config , print_match , print_cfg , and quit_count .
  2. The configs are not always in that order as the example above, except for +config= which is always the first one.
  3. Any of the known configs mentioned in (1) may not be present in the line.

Examples:

Input 1:

testA +config=reg +input=walk1s +print_match=1 +testlength=short

Expected output 1:

% other_cfg = walk1s

Input 2:

testA +config=mem +quit_count=50 +order=reverse +input=rand +testlength=long

Expected output 2:

% other_cfg = reverse.rand

sed is for simple substitutions on individual lines, that is all. That's not what this problem is so it's not a job for sed, it's a job for awk.

$ cat file
"<testcase_name> +config=<main_cfg> +cfg1=<cfg1> +cfg2=<cfg2> +testlength=<string> +print_cfg=<string> +print_match=<string> +quit_count=<string>"
testA +config=reg +input=walk1s +print_match=1 +testlength=short
testA +config=mem +quit_count=50 +order=reverse +input=rand +testlength=long

$ cat tst.awk
BEGIN{
    FS="[ =]+"
    split("config testlength print_match print_cfg quit_count",tmp)
    for (i in tmp) {
        skip_cfgs["+"tmp[i]]
    }
}
{
    other_cfg = ""
    for (i=2;i<=NF;i+=2) {
        if ( !($i in skip_cfgs) ) {
            other_cfg = (other_cfg=="" ? "" : other_cfg ".") $(i+1)
        }
    }
    print "% other_cfg =", other_cfg
}

$ awk -f tst.awk file
% other_cfg = <cfg1>.<cfg2>
% other_cfg = walk1s
% other_cfg = reverse.rand

If +cfg1 and +cfg2 are guaranteed to always been in that order in the line, then yes; just subst out everything else:

sed 's/.*+cfg1=<//;s/>.*+cfg2=<//;s/>.*//'

Otherwise, you'll want at least awk .

This might work for you (GNU sed):

sed -r 's/^\S+\s+(.*)$/other_cfg = \1\n+config=+testlength=+print_match=+print_cfg=+quit_count=/;:a;s/ (\+\S+=)\S+(.*\n.*)\1/\2/;ta;s/ \+[^=]*=/./g;s/\./ /;P;d' file

This replaces the first string by the required one and adds a lookup table to the pattern space which includes all the config strings not needed. The unwanted config strings are then iteratively removed and remaining configs morphed into the required output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM