[英]Extract strings between 2 texts enclosed between squared brackets
I have strings similar to below ones我有类似于下面的字符串
1. the quick brown `[fox].[jumps]` [over] the lazy dog
2. the quick brown fox [jumps] [over] the lazy dog
3. `[the].[quick]` brown `[fox].[jumps]` [over] the lazy dog
I would need to extract below values我需要提取以下值
1. fox.jumps
2. <Nothing>
3. the.quick, fox.jumps
Please could you help me with the regular expressions in shell scripts?请你帮我处理 shell 脚本中的正则表达式?
With GNU awk for multi-char RS, RT, and gensub():对于多字符 RS、RT 和 gensub(),使用 GNU awk:
$ awk -v RS='[[][^]]*][.][[][^]]*]' 'RT{print gensub(/[][]/,"","g",RT)}' file
fox.jumps
the.quick
fox.jumps
With your shown samples, could you please try following.使用您显示的示例,您能否尝试以下操作。 Written and tested in GNU
awk
.在 GNU
awk
中编写和测试。
awk '
{
val=""
for(i=1;i<=NF;i++){
if($i~/^\[.*]\.\[.*]$/){
gsub(/[][]/,"",$i)
val=(val?val ", ":"")$i
}
}
print (val==""?"<Nothing>":val)
}' Input_file
Sample output will be as follows as per shown samples.样品 output 将按照所示样品如下。
fox.jumps
<Nothing>
the.quick, fox.jumps
This is another one-liner... from shell point of view (you can remove \
and newline
to make it one line).这是另一个单行...从 shell 的角度来看(您可以删除
\
和newline
使其成为一行)。
Make sure \
is always last character of the line and no space after that.确保
\
始终是该行的最后一个字符,之后没有空格。
gawk '{\
for(i=1;i<NF;i++)\
{\
if(match($i,/\]\.\[/)>0)\
{\
for(k=1;k<length($i);k++)\
{\
c=substr($i,k,1);\
if(c!="[" && c!="]")\
printf("%s",c);\
}\
printf(" ");\
}\
}\
printf("\n");\
}' example.txt
Anyway, it would be useful to put the gawk-code in between '
and '
into a file (file.awk, in file.awk remove all \
) and then call, gawk like so, meaning test.awk starts with {
and ends with }
. Anyway, it would be useful to put the gawk-code in between
'
and '
into a file (file.awk, in file.awk remove all \
) and then call, gawk like so, meaning test.awk starts with {
and ends与}
。 It might not be an elegant solution, but you can add a lot more to this, like many variables, a whole program, subroutines, ...这可能不是一个优雅的解决方案,但您可以在其中添加更多内容,例如许多变量、整个程序、子例程......
gawk -f test.awk example.txt
Output:
fox.jumps
the.quick fox.jumps
With sed
(that supports \n
in s/
commands).使用
sed
(在s/
命令中支持\n
)。
sed '
s/$/\n/
: again
/\([^\n]*\)\[\([^]]*\)\]\.\[\([^]]*\)\]\([^\n]*\)\n/{
s//\1\4\n\2.\3\n/
b again
}
s/[^\n]*\n//
s/\n$//
s/\n/, /g
'
s/$/\n/
add a newline on the end of read line. s/$/\n/
在读取行的末尾添加一个换行符。 Important in case of lines without any regex.: again
define label again
that you can go to : again
again
label ,你可以 go 到/../
- match a regex /../
- 匹配一个正则表达式
\([^\n]*\)
match any non-newline and remember it in \1
\([^\n]*\)
匹配任何非换行符并记住它在\1
\[\([^]]*\)\]\.\[\([^]]*\)\]
Match [somethign].[something]
and remember parts in \2
and \3
\[\([^]]*\)\]\.\[\([^]]*\)\]
匹配[somethign].[something]
并记住\2
和\3
中的部分\([^\n]*\)
- match any non-newline \([^\n]*\)
- 匹配任何非换行符/.../{
- when the regex is matched - s//
- reuse last regex, ie. /.../{
- 当正则表达式匹配时 - s//
- 重用最后一个正则表达式,即。 the one above - /\1\4\n\2.\3\n/
- shuffle input so that place the non-interesting parts before the newline, and extracted interesting part after the newline - b again
- go to again, to match another pattern/\1\4\n\2.\3\n/
- 随机输入,以便将不感兴趣的部分放在换行符之前,并在换行符之后提取有趣的部分 - b again
- go 再次到匹配另一个模式s/[^\n]*\n//
remove the non-matched part of line s/[^\n]*\n//
删除行中不匹配的部分s/\n$//
- remove trailing newline s/\n$//
- 删除尾随换行符s/\n/, /g
separate parts with comma and a space. s/\n/, /g
用逗号和空格分隔部分。 Example:例子:
$ sed 's/$/\n/; : again; /\([^\n]*\)\[\([^]]*\)\]\.\[\([^]]*\)\]\([^\n]*\)\n/{ s//\1\4\n\2.\3\n/; b again; }; s/[^\n]*\n//; s/\n$//; s/\n/, /g' <<EOF
the quick brown [fox].[jumps] [over] the lazy dog
the quick brown fox [jumps] [over] the lazy dog
[the].[quick] brown [fox].[jumps] [over] the lazy dog
EOF
outputs:输出:
fox.jumps
the.quick, fox.jumps
If you do not want the empty line in between, then do not output anything from sed in such case where no patterns where found.如果您不希望中间有空行,那么在没有找到模式的情况下,不要 output 中的任何 sed 中的任何内容。 Add
sed -n
and on the end of script do not output if empty - /^$/!p
, like so:添加
sed -n
并在脚本末尾不要 output 如果为空 - /^$/!p
,像这样:
sed -n 's/$/\n/; : again; /\([^\n]*\)\[\([^]]*\)\]\.\[\([^]]*\)\]\([^\n]*\)\n/{ s//\1\4\n\2.\3\n/; b again; }; s/[^\n]*\n//; s/\n$//; s/\n/, /g; /^$/!p'
With GNU awk
for FPAT (using regexp and gensub
function from Ed Morton's code):使用 GNU
awk
进行 FPAT(使用 Ed Morton 的代码中的 regexp 和gensub
function):
awk -v OFS=', ' -v FPAT='[[][^]]*][.][[][^]]*]' '{for (i=1; i<=NF; i++) printf "%s%s", gensub(/[][]/,"","g",$i), (i<NF?OFS:ORS)}' file
fox.jumps
the.quick, fox.jumps
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.