简体   繁体   中英

split string (e.g. with bash) but skip part of it

How can I split with bash (awk, sed, whatever) the following string:

in:

a,b,[c, d],e

output:

a
b
[c, d]
e

try 1)

$IFS=',' read -a tokens <<< "a,b,[c, d], e"; echo ${tokens[@]}
a b [c d] e

try 2)

$ IFS=',' 
$ line="a,b,[c, d], e"
$ eval x=($line)
$ echo ${x[1]}
b
$ echo ${x[0]}
a
$ echo ${x[2]}
[c  d]

But not ','!

This is just a specific instance of the general CSV problem of identifying commas inside quotes differently from those outside of quotes in order to replace either one with some other character (eg ; ). The idiomatic awk solution to that (besides using FPAT in GNU awk) is:

Replace inside the quotes:

$ echo 'a,b,"c, d",e' | awk 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/,/,";",$i)}1'
a,b,"c; d",e

Replace outside the quotes:

$ echo 'a,b,"c, d",e' | awk 'BEGIN{FS=OFS="\""} {for (i=1;i<=NF;i+=2) gsub(/,/,";",$i)}1'
a;b;"c, d";e

In your case the delimiters are [...] instead of "..." and the replacement character is a newline instead of a semi-colon but it's essentially the same problem:

Replace outside the "quotes" (square brackets):

$ echo 'a,b,[c, d],e' | awk 'BEGIN{FS="[][]"; OFS=""} {for (i=1;i<=NF;i+=2) gsub(/,/,"\n",$i)}1'
a
b
c, d
e

Note that the square brackets are gone because I set OFS to a blank char since there is no 1 single FS character to use. You can get them back with this if you actually do need them:

$ echo 'a,b,[c, d],e' | awk 'BEGIN{FS="[][]"; OFS=""} {for (i=1;i<=NF;i++) if (i%2) gsub(/,/,"\n",$i); else $i="["$i"]"}1'
a
b
[c, d]
e

but chances are you don't since their purpose was to group text that contained commas and now that's handled by the newlines being the field separators instead of commas.

You can for example use this grep:

grep -Po '([a-z]|\[[a-z], [a-z]\])'
           ^^^^^ ^^^^^^^^^^^^^^^^ 

See:

$ echo "a,b,[c, d],e" | grep -Po '([a-z]|\[[a-z], [a-z]\])'
a
b
[c, d]
e

That is, use grep to print only (hence the -o , to match only), either blocks of [az] letter or [ + [az], [az] + ] .

Or you can also make the opening [ and closing , [az]] block optional:

$ echo "a,b,[c, d],e" | grep -Po '(\[)?[a-z](, [a-z]\])?'
a
b
[c, d]
e

Match everything that starts with [ and ends with ] : \\[[^][]*\\] . Then match anything that's not a comma: [^,]\\+ :

echo 'a,b,[c, d],e' | grep -o -e '\[[^][]*\]' -e '[^,]\+'

Output:

a
b
[c, d]
e

echo "a,b,[c, d],e" | grep -o '\\[.*\\]\\|[^,]*'

Output:

a
b
[c, d]
e

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM