简体   繁体   English

Bash脚本 - 使用正则表达式分隔符拆分字符串

[英]Bash Script - split string using regex delimiter

I want to split string something like 'substring1 substring2 ONCE[0,10s] substring3'. 我想分割像'substring1 substring2 ONCE [0,10s] substring3'这样的字符串。 The expected result should be (with delimiter 'ONCE[0,10s]'): 预期的结果应该是(带分隔符'ONCE [0,10s]'):

substring1 substring2
substring3

The problem is that the number in delimiter is variable such as 'ONCE[0,1s]' or 'ONCE[0,3m]' or 'ONCE[0,10d]' and so on. 问题是分隔符中的数字是可变的,例如'ONCE [0,1s]'或'ONCE [0,3m]'或'ONCE [0,10d]'等等。

How can I do this in bash script ? 我怎样才能在bash脚本中执行此操作? Any idea ? 任何的想法 ?

Thank you 谢谢

The example provided in the OP (as well as the two answers provided by @GlennJackman and @devnull) assume that the actual question could have been: OP中提供的示例(以及@GlennJackman和@devnull提供的两个答案)假设实际问题可能是:

In bash, how do I replace the match for a regular expression in a string with a newline. 在bash中,如何使用换行符替换字符串中正则表达式的匹配项。

That's not actually the same as "split a string using a regular expression", unless you add the constraint that the string does not contain any newline characters. 这与“使用正则表达式拆分字符串”实际上并不相同,除非您添加字符串不包含任何换行符的约束。 And even then, it's not actually "splitting" the string; 即便如此,它实际上并没有“分裂”字符串; the presumption is that some other process will use a newline to split the result. 假设其他一些过程将使用换行符来分割结果。

Once the question has been reformulated, the solution is not challenging. 一旦重新提出问题,解决方案就没有挑战性。 You could use any tool which supports regular expressions, such as sed : 您可以使用任何支持正则表达式的工具,例如sed

sed 's/ *ONCE\[[^]]*] */\n/g' <<<"$variable"

(Remove the g if you only want to replace the first sequence; you may need to adjust the regular expression, since it wasn't quite clear what the desired constraints are.) (如果您只想替换第一个序列,请删除g ;您可能需要调整正则表达式,因为不太清楚所需的约束是什么。)

bash itself does not provide a replace all primitive using regular expressions, although it does have "patterns" and, if the option extglob is set (which is the default on some distributions), the patterns are sufficiently powerful to express the pattern, so you could use: bash本身不提供使用正则表达式replace all原语,虽然它确实有“模式”,如果设置了选项extglob (这是某些发行版的默认设置),则模式足以表达模式,所以你可以用:

echo "${variable//*( )ONCE\[*([^]])]*( )/$'\n'}"

Again, you can make the substitution only happen once by changing // to / and you may need to change the pattern to meet your precise needs. 同样,您可以通过将//更改为/来进行替换,您可能需要更改模式以满足您的精确需求。

That leaves open the question of how to actually split a bash variable using a delimiter specified by a regular expression, for some definition of "split". 这就留下了如何使用正则表达式指定的分隔符实际拆分bash变量的问题,对于“split”的某些定义。 One possible definition is "call a function with the parts of the string as arguments"; 一个可能的定义是“使用字符串的部分作为参数调用函数”; that's the one which we use here: 这就是我们在这里使用的那个:

# Usage:
# call_with_split <pattern> <string> <cmd> <args>...
# Splits string according to regular expression pattern and then invokes
# cmd args string-pieces
call_with_split () { 
  if [[ $2 =~ ($1).* ]]; then
    call_with_split "$1" \
                    "${2:$((${#2} - ${#BASH_REMATCH[0]} + ${#BASH_REMATCH[1]}))}" \
                    "${@:3}" \
                    "${2:0:$((${#2} - ${#BASH_REMATCH[0]}))}"
  else
    "${@:3}" "$2"
  fi
}

Example: 例:

$ var="substring1 substring2 ONCE[0,10s] substring3"
$ call_with_split " ONCE\[[^]]*] " "$var" printf "%s\n"
substring1 substring2
substring3

bash: 庆典:

s='substring1 substring2 ONCE[0,10s] substring3'

if [[ $s =~ (.+)" ONCE["[0-9]+,[0-9]+[smhd]"] "(.+) ]]; then
    echo "${BASH_REMATCH[1]}"
    echo "${BASH_REMATCH[2]}"
else 
    echo no match
fi
substring1 substring2
substring3

You could use awk . 你可以使用awk Specify the field separator as: 将字段分隔符指定为:

'ONCE[[]0,[^]]*[]] *'

For example, using your sample input: 例如,使用您的示例输入:

$ awk -F 'ONCE[[]0,[^]]*[]] *' '{for(i=1;i<=NF;i++){printf $i"\n"}}' <<< "substring1 substring2 ONCE[0,10s] substring3"
substring1 substring2 
substring3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM