[英]split string (e.g. with bash) but skip part of it
How can I split with bash (awk, sed, whatever) the following string: 如何用bash(awk,sed,无论如何)拆分以下字符串:
in: 在:
a,b,[c, d],e
output: 输出:
a
b
[c, d]
e
try 1) 尝试1)
$IFS=',' read -a tokens <<< "a,b,[c, d], e"; echo ${tokens[@]}
a b [c d] e
try 2) 尝试2)
$ IFS=','
$ line="a,b,[c, d], e"
$ eval x=($line)
$ echo ${x[1]}
b
$ echo ${x[0]}
a
$ echo ${x[2]}
[c d]
But not ','!
This is just a specific instance of the general CSV problem of identifying commas inside quotes differently from those outside of quotes in order to replace either one with some other character (eg ;
). 这只是一般CSV问题的一个特定实例,即识别引号内的逗号与引号之外的逗号不同,以便用一个其他字符替换任何一个(例如;
)。 The idiomatic awk solution to that (besides using FPAT in GNU awk) is: 这个惯用的awk解决方案(除了在GNU awk中使用FPAT)是:
Replace inside the quotes: 在引号内替换:
$ echo 'a,b,"c, d",e' | awk 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/,/,";",$i)}1'
a,b,"c; d",e
Replace outside the quotes: 在引号之外替换:
$ echo 'a,b,"c, d",e' | awk 'BEGIN{FS=OFS="\""} {for (i=1;i<=NF;i+=2) gsub(/,/,";",$i)}1'
a;b;"c, d";e
In your case the delimiters are [...]
instead of "..."
and the replacement character is a newline instead of a semi-colon but it's essentially the same problem: 在你的情况下,分隔符是[...]
而不是"..."
,替换字符是换行符而不是分号,但它基本上是同一个问题:
Replace outside the "quotes" (square brackets): 在“引号”(方括号)之外替换:
$ echo 'a,b,[c, d],e' | awk 'BEGIN{FS="[][]"; OFS=""} {for (i=1;i<=NF;i+=2) gsub(/,/,"\n",$i)}1'
a
b
c, d
e
Note that the square brackets are gone because I set OFS to a blank char since there is no 1 single FS character to use. 请注意,方括号不见了,因为我将OFS设置为空白字符,因为没有1个单独的FS字符可供使用。 You can get them back with this if you actually do need them: 如果你确实需要它们,你可以用它来取回它们:
$ echo 'a,b,[c, d],e' | awk 'BEGIN{FS="[][]"; OFS=""} {for (i=1;i<=NF;i++) if (i%2) gsub(/,/,"\n",$i); else $i="["$i"]"}1'
a
b
[c, d]
e
but chances are you don't since their purpose was to group text that contained commas and now that's handled by the newlines being the field separators instead of commas. 但是你没有机会,因为他们的目的是将包含逗号的文本分组,现在由换行处理的是字段分隔符而不是逗号。
You can for example use this grep: 你可以使用这个grep:
grep -Po '([a-z]|\[[a-z], [a-z]\])'
^^^^^ ^^^^^^^^^^^^^^^^
See: 看到:
$ echo "a,b,[c, d],e" | grep -Po '([a-z]|\[[a-z], [a-z]\])'
a
b
[c, d]
e
That is, use grep
to print only (hence the -o
, to match only), either blocks of [az]
letter or [
+ [az], [az]
+ ]
. 也就是说,使用grep
仅打印(因此-o
,仅匹配), [az]
字母或[
+ [az], [az]
+ ]
。
Or you can also make the opening [
and closing , [az]]
block optional: 或者您也可以选择打开[
和关闭, [az]]
块:
$ echo "a,b,[c, d],e" | grep -Po '(\[)?[a-z](, [a-z]\])?'
a
b
[c, d]
e
Match everything that starts with [
and ends with ]
: \\[[^][]*\\]
. 匹配以[
并以]
结尾的所有内容: \\[[^][]*\\]
。 Then match anything that's not a comma: [^,]\\+
: 然后匹配任何不是逗号的内容: [^,]\\+
:
echo 'a,b,[c, d],e' | grep -o -e '\[[^][]*\]' -e '[^,]\+'
Output: 输出:
a
b
[c, d]
e
echo "a,b,[c, d],e" | grep -o '\\[.*\\]\\|[^,]*'
Output: 输出:
a
b
[c, d]
e
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.