[英]Extracting specific string in a line with sed
In a line of string like this: 在这样的字符串行中:
"<testcase_name> +config=<main_cfg> +cfg1=<cfg1> +cfg2=<cfg2> +testlength=<string> +print_cfg=<string> +print_match=<string> +quit_count=<string>"
I would like to extract those strings that are after +cfg1=
and +cfg2=
only, and not including these specific configurations: +config=
, +testlength=
, +print_match=
, +print_cfg=
, and +quit_count=
. 我想提取仅在+cfg1=
+cfg2=
+cfg1=
和+cfg2=
之后的那些字符串,并且不包括以下特定配置: +config=
, +testlength=
, +print_match=
, +print_cfg=
和+quit_count=
。
So I would like to store the result in a variable and be able to view it as: 所以我想将结果存储在变量中,并能够将其查看为:
echo "other_cfg = $other_cfg"
% other_cfg = cfg1.cfg2
Notice a .
注意一个.
separates cfg1
and cfg2
strings. 分离cfg1
和cfg2
字符串。 Is there a single line (if possible in sed) that can do this? 有没有一行(如果可能的话,在sed中)可以做到这一点?
More conditions: 更多条件:
+cfg1
and +cfg2
could be any string AND there could be more of them. +cfg1
+cfg2
和+cfg2
可以是任何字符串,并且可以更多。 So the key here is to just not include these known configs: testlength
, config
, print_match
, print_cfg
, and quit_count
. 因此,这里的关键是仅不包括以下已知配置: testlength
, config
, print_match
, print_cfg
和quit_count
。 +config=
which is always the first one. 除了+config=
始终是第一个+config=
之外, +config=
并不总是按照上面的示例那样进行。 Examples: 例子:
Input 1: 输入1:
testA +config=reg +input=walk1s +print_match=1 +testlength=short
Expected output 1: 预期输出1:
% other_cfg = walk1s
Input 2: 输入2:
testA +config=mem +quit_count=50 +order=reverse +input=rand +testlength=long
Expected output 2: 预期输出2:
% other_cfg = reverse.rand
sed is for simple substitutions on individual lines, that is all. sed用于单行替换,仅此而已。 That's not what this problem is so it's not a job for sed, it's a job for awk. 这不是问题所在,因此它不是sed的工作,而是awk的工作。
$ cat file
"<testcase_name> +config=<main_cfg> +cfg1=<cfg1> +cfg2=<cfg2> +testlength=<string> +print_cfg=<string> +print_match=<string> +quit_count=<string>"
testA +config=reg +input=walk1s +print_match=1 +testlength=short
testA +config=mem +quit_count=50 +order=reverse +input=rand +testlength=long
$ cat tst.awk
BEGIN{
FS="[ =]+"
split("config testlength print_match print_cfg quit_count",tmp)
for (i in tmp) {
skip_cfgs["+"tmp[i]]
}
}
{
other_cfg = ""
for (i=2;i<=NF;i+=2) {
if ( !($i in skip_cfgs) ) {
other_cfg = (other_cfg=="" ? "" : other_cfg ".") $(i+1)
}
}
print "% other_cfg =", other_cfg
}
$ awk -f tst.awk file
% other_cfg = <cfg1>.<cfg2>
% other_cfg = walk1s
% other_cfg = reverse.rand
If +cfg1
and +cfg2
are guaranteed to always been in that order in the line, then yes; 如果+cfg1
和+cfg2
,保证始终在该行的顺序,那么是的; just subst out everything else: 消除所有其他内容:
sed 's/.*+cfg1=<//;s/>.*+cfg2=<//;s/>.*//'
Otherwise, you'll want at least awk
. 否则,您至少需要awk
。
This might work for you (GNU sed): 这可能对您有用(GNU sed):
sed -r 's/^\S+\s+(.*)$/other_cfg = \1\n+config=+testlength=+print_match=+print_cfg=+quit_count=/;:a;s/ (\+\S+=)\S+(.*\n.*)\1/\2/;ta;s/ \+[^=]*=/./g;s/\./ /;P;d' file
This replaces the first string by the required one and adds a lookup table to the pattern space which includes all the config strings not needed. 这将用所需的字符串替换第一个字符串,并向模式空间添加一个查找表,其中包括所有不需要的配置字符串。 The unwanted config strings are then iteratively removed and remaining configs morphed into the required output. 然后,反复删除不需要的配置字符串,并将其余配置变形为所需的输出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.