[英]grep regex - how do I match same character pairs?
Say I have the following string:假设我有以下字符串:
blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \
and I want to match the following 3 matches using grep:我想使用 grep 匹配以下 3 个匹配项:
\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\
and和
\ \\ \\ \\ \
and和
\ a\\b\\c\\ \
to do this I need a way to pair '\\' as to only end the match when there is a single closing '\' that isn't part of a pair.为此,我需要一种配对 '\\' 的方法,以便仅在有一个不属于配对的单个结束 '\' 时才结束比赛。
so far I have this:到目前为止,我有这个:
echo $string | grep -oP '\\((?!\\).)*\\'
edit: I managed to get it working in the regex101 environment:编辑:我设法让它在 regex101 环境中工作:
\\((?!\\).|(([\\]{2})+))+\\
https://regex101.com/r/wC2cF1/13 https://regex101.com/r/wC2cF1/13
but it's still giving me the same result in grep perl但它仍然给我同样的结果 grep perl
Use利用
text='blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \'
echo "$text" | grep -oE '\\([^\\]|\\\\)+\\'
Output: Output:
\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\
\ \\ \\ \\ \
\ a\\b\\c\\ \
If you have GNU grep then @RyszardCzech's answer is a good solution, otherwise using any awk in any shell on every UNIX box: If you have GNU grep then @RyszardCzech's answer is a good solution, otherwise using any awk in any shell on every UNIX box:
$ cat tst.awk
{
gsub(/\\\\/,RS)
while ( match($0,/\\[^\\]*\\/) ) {
tgt = substr($0,RSTART,RLENGTH)
gsub(RS,"\\\\",tgt)
print tgt
$0 = substr($0,RSTART+RLENGTH)
}
}
. .
$ awk -f tst.awk file
\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\
\ \\ \\ \\ \
\ a\\b\\c\\ \
Using the core Text::Balanced
module to extract the string:使用核心Text::Balanced
模块提取字符串:
$ perl -MText::Balanced=extract_delimited -nE '$text = extract_delimited($_, q/\\/, qr/^[^\\]*/, q/\\/); say $text' input.txt
\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\
Note: This solution is simpler and better than the answer below.注意:此解决方案比下面的答案更简单更好。 But beware that its behaviour is different on a string \\\xy\
, for example.但请注意,例如,它在字符串\\\xy\
上的行为是不同的。
sed 's/\\\\/\x00/g' file | grep -ao '\\[^\\]*\\' | sed 's/\x00/\\\\/g'
sed
replaces each double backslash ( \\
) with a null character (highly unlikely to occur in the original data to be processed).第一个sed
将每个双反斜杠 ( \\
) 替换为 null 字符(极不可能出现在要处理的原始数据中)。grep
captures and prints the characters between matching single backslashes ( \
). grep
捕获并打印匹配的单个反斜杠 ( \
) 之间的字符。 The GNU specific -a
option allows to process a binary file as if it were a text file since the stream may contain null characters at this point. GNU 特定的-a
选项允许像处理文本文件一样处理二进制文件,因为此时 stream 可能包含 null 字符。 With the GNU specific -o
option, grep
prints only the matching parts of the line, each one on a separate output line.使用 GNU 特定的-o
选项, grep
仅打印该行的匹配部分,每个部分位于单独的 output 行上。sed
restores the double backslashes by replacing each null character with a \\
.最后一个sed
通过用\\
替换每个 null 字符来恢复双反斜杠。Please notice that those are highly GNU specific.请注意,这些都是高度 GNU 特定的。
Test:测试:
$ line='blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \' $ sed 's/\\\\/\x00/g' <<< "$line" | grep -ao '\\[^\\]*\\' | sed 's/\x00/\\\\/g' \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ \ \\ \\ \\ \ \ a\\b\\c\\ \
With echo grep and tail...带回声 grep 和尾...
string='blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \'
echo ${string} | grep -o -E "([ \]{1,2}[ a-z]{0,2}[ \]{0,2}){1,4}" | tail -n2 | grep -o -E "[abc \]{1,32}"
Puts out...发出...
\ \\ \\ \\ \
\ a\\b\\c\\ \
grep -E
means: Using an extended regular expression grep -E
表示:使用扩展的正则表达式
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.