简体   繁体   English

使用bash提取字符串的多个部分

[英]Extracting multiple parts of a string using bash

I have a caret delimited (key=value) input and would like to extract multiple tokens of interest from it. 我有一个插入符号定界(key = value)输入,并想从中提取多个感兴趣的标记。

For example: Given the following input 例如:给定以下输入

$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"
1=A00^35=D^22=101^150=1^33=1
1=B000^35=D^22=101^150=2^33=2    

I would like the following output 我想要以下输出

35=D^150=1^
35=D^150=2^

I have tried the following 我尝试了以下

$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"|egrep -o "35=[^/^]*\^|150=[^/^]*\^"
35=D^
150=1^
35=D^
150=2^

My problem is that egrep returns each match on a separate line. 我的问题是egrep在单独的行上返回每个匹配项。 Is it possible to get one line of output for one line of input ? 输入一行可以得到一行输出吗? Please note that due to the constraints of the larger script, I cannot simply do a blind replace of all the \\n characters in the output. 请注意,由于较大脚本的限制,我不能简单地盲目替换输出中的所有\\ n字符。

Thank you for any suggestions.This script is for bash 3.2.25. 感谢您的任何建议。此脚本适用于bash 3.2.25。 Any egrep alternatives are welcome. 欢迎任何egrep替代方案。 Please note that the tokens of interest (35 and 150) may change and I am already generating the egrep pattern in the script. 请注意,感兴趣的令牌(35和150)可能会更改,并且我已经在脚本中生成了egrep模式。 Hence a one liner (if possible) would be great 因此,一支班轮(如果可能)会很棒

You have two options. 您有两个选择。 Option 1 is to change the "white space character" and use set -- : 选项1是更改“空白字符”并使用set --

OFS=$IFS
IFS="^ "
set -- 1=A00^35=D^150=1^33=1  # No quotes here!!
IFS="$OFS"

Now you have your values in $1 , $2 , etc. 现在您的值在$1$2等中。

Or you can use an array: 或者您可以使用数组:

tmp=$(echo "1=A00^35=D^150=1^33=1" | sed -e 's:\([0-9]\+\)=: [\1]=:g' -e 's:\^ : :g')
eval value=($tmp)
echo "35=${value[35]}^150=${value[150]}"

To get rid of the newline, you can just echo it again: 要摆脱换行符,您可以再次回显它:

$ echo $(echo "1=A00^35=D^150=1^33=1"|egrep -o "35=[^/^]*\^|150=[^/^]*\^")
35=D^ 150=1^

If that's not satisfactory (I think it may give you one line for the whole input file), you can use awk : 如果这不令人满意(我认为整个输入文件可能会给您一行),则可以使用awk

pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=35,150 -F^ ' {
        sep = "";
        split (LIST, srch, ",");
        for (i = 1; i <= NF; i++) {
            for (idx in srch) {
                split ($i, arr, "=");
                if (arr[1] == srch[idx]) {
                    printf sep "" arr[1] "=" arr[2];
                    sep = "^";
                }
            }
        }
        if (sep != "") {
            print sep;
        }
    }'
35=D^150=1^
35=d^

pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=1,33 -F^ ' {
        sep = "";
        split (LIST, srch, ",");
        for (i = 1; i <= NF; i++) {
            for (idx in srch) {
                split ($i, arr, "=");
                if (arr[1] == srch[idx]) {
                    printf sep "" arr[1] "=" arr[2];
                    sep = "^";
                }
            }
        }
        if (sep != "") {
            print sep;
        }
    }'
1=A00^33=1^
1=a00^33=11^

This one allows you to use a single awk script and all you need to do is to provide a comma-separated list of keys to print out. 这使您可以使用单个awk脚本,而您所需要做的就是提供一个用逗号分隔的键列表以打印出来。


And here's the one-liner version :-) 这是单线版本:-)

echo '1=A00^35=D^150=1^33=1
      1=a00^35=d^157=11^33=11
      ' | awk -vLST=1,33 -F^ '{s="";split(LST,k,",");for(i=1;i<=NF;i++){for(j in k){split($i,arr,"=");if(arr[1]==k[j]){printf s""arr[1]"="arr[2];s="^";}}}if(s!=""){print s;}}'

given a file 'in' containing your strings : 给定一个包含您的字符串的文件“ in”:

$ for i in $(cut -d^ -f2,3 < in);do echo $i^;done
35=D^150=1^
35=D^150=2^

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM