简体   繁体   English

提取方括号之间的两个文本之间的字符串

[英]Extract strings between 2 texts enclosed between squared brackets

I have strings similar to below ones我有类似于下面的字符串

1. the quick brown `[fox].[jumps]` [over] the lazy dog
 2. the quick brown fox [jumps] [over] the lazy dog
 3. `[the].[quick]` brown `[fox].[jumps]` [over] the lazy dog

I would need to extract below values我需要提取以下值

 1. fox.jumps
 2. <Nothing>
 3. the.quick, fox.jumps

Please could you help me with the regular expressions in shell scripts?请你帮我处理 shell 脚本中的正则表达式?

With GNU awk for multi-char RS, RT, and gensub():对于多字符 RS、RT 和 gensub(),使用 GNU awk:

$ awk -v RS='[[][^]]*][.][[][^]]*]' 'RT{print gensub(/[][]/,"","g",RT)}' file
fox.jumps
the.quick
fox.jumps

With your shown samples, could you please try following.使用您显示的示例,您能否尝试以下操作。 Written and tested in GNU awk .在 GNU awk中编写和测试。

awk '
{
  val=""
  for(i=1;i<=NF;i++){
    if($i~/^\[.*]\.\[.*]$/){
      gsub(/[][]/,"",$i)
      val=(val?val ", ":"")$i
    }
  }
  print (val==""?"<Nothing>":val)
}'  Input_file

Sample output will be as follows as per shown samples.样品 output 将按照所示样品如下。

fox.jumps
<Nothing>
the.quick, fox.jumps

This is another one-liner... from shell point of view (you can remove \ and newline to make it one line).这是另一个单行...从 shell 的角度来看(您可以删除\newline使其成为一行)。

Make sure \ is always last character of the line and no space after that.确保\始终是该行的最后一个字符,之后没有空格。

gawk '{\
  for(i=1;i<NF;i++)\
  {\
     if(match($i,/\]\.\[/)>0)\
     {\
         for(k=1;k<length($i);k++)\
         {\
            c=substr($i,k,1);\
            if(c!="[" && c!="]")\
            printf("%s",c);\
         }\
         printf(" ");\
     }\
  }\
  printf("\n");\
}' example.txt

Anyway, it would be useful to put the gawk-code in between ' and ' into a file (file.awk, in file.awk remove all \ ) and then call, gawk like so, meaning test.awk starts with { and ends with } . Anyway, it would be useful to put the gawk-code in between ' and ' into a file (file.awk, in file.awk remove all \ ) and then call, gawk like so, meaning test.awk starts with { and ends与} It might not be an elegant solution, but you can add a lot more to this, like many variables, a whole program, subroutines, ...这可能不是一个优雅的解决方案,但您可以在其中添加更多内容,例如许多变量、整个程序、子例程......

gawk -f test.awk example.txt
Output:
fox.jumps 

the.quick fox.jumps

With sed (that supports \n in s/ commands).使用sed (在s/命令中支持\n )。

sed '
    s/$/\n/
    : again
    /\([^\n]*\)\[\([^]]*\)\]\.\[\([^]]*\)\]\([^\n]*\)\n/{
        s//\1\4\n\2.\3\n/
        b again
    }
    s/[^\n]*\n//
    s/\n$//
    s/\n/, /g
'
  • s/$/\n/ add a newline on the end of read line. s/$/\n/在读取行的末尾添加一个换行符。 Important in case of lines without any regex.在没有任何正则表达式的行的情况下很重要。
  • : again define label again that you can go to : again again label ,你可以 go 到
  • /../ - match a regex /../ - 匹配一个正则表达式
    • \([^\n]*\) match any non-newline and remember it in \1 \([^\n]*\)匹配任何非换行符并记住它在\1
    • \[\([^]]*\)\]\.\[\([^]]*\)\] Match [somethign].[something] and remember parts in \2 and \3 \[\([^]]*\)\]\.\[\([^]]*\)\]匹配[somethign].[something]并记住\2\3中的部分
    • \([^\n]*\) - match any non-newline \([^\n]*\) - 匹配任何非换行符
  • /.../{ - when the regex is matched - s// - reuse last regex, ie. /.../{ - 当正则表达式匹配时 - s// - 重用最后一个正则表达式,即。 the one above - /\1\4\n\2.\3\n/ - shuffle input so that place the non-interesting parts before the newline, and extracted interesting part after the newline - b again - go to again, to match another pattern上面的一个 - /\1\4\n\2.\3\n/ - 随机输入,以便将不感兴趣的部分放在换行符之前,并在换行符之后提取有趣的部分 - b again - go 再次到匹配另一个模式
  • s/[^\n]*\n// remove the non-matched part of line s/[^\n]*\n//删除行中不匹配的部分
  • s/\n$// - remove trailing newline s/\n$// - 删除尾随换行符
  • s/\n/, /g separate parts with comma and a space. s/\n/, /g用逗号和空格分隔部分。

Example:例子:

$ sed 's/$/\n/; : again; /\([^\n]*\)\[\([^]]*\)\]\.\[\([^]]*\)\]\([^\n]*\)\n/{ s//\1\4\n\2.\3\n/; b again; }; s/[^\n]*\n//; s/\n$//; s/\n/, /g' <<EOF
the quick brown [fox].[jumps] [over] the lazy dog
the quick brown fox [jumps] [over] the lazy dog
[the].[quick] brown [fox].[jumps] [over] the lazy dog
EOF

outputs:输出:

fox.jumps

the.quick, fox.jumps

If you do not want the empty line in between, then do not output anything from sed in such case where no patterns where found.如果您不希望中间有空行,那么在没有找到模式的情况下,不要 output 中的任何 sed 中的任何内容。 Add sed -n and on the end of script do not output if empty - /^$/!p , like so:添加sed -n并在脚本末尾不要 output 如果为空 - /^$/!p ,像这样:

sed -n 's/$/\n/; : again; /\([^\n]*\)\[\([^]]*\)\]\.\[\([^]]*\)\]\([^\n]*\)\n/{ s//\1\4\n\2.\3\n/; b again; }; s/[^\n]*\n//; s/\n$//; s/\n/, /g; /^$/!p'

With GNU awk for FPAT (using regexp and gensub function from Ed Morton's code):使用 GNU awk进行 FPAT(使用 Ed Morton 的代码中的 regexp 和gensub function):

awk -v OFS=', ' -v FPAT='[[][^]]*][.][[][^]]*]' '{for (i=1; i<=NF; i++) printf "%s%s", gensub(/[][]/,"","g",$i), (i<NF?OFS:ORS)}' file
fox.jumps
the.quick, fox.jumps

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM