简体   繁体   English

grep 正则表达式 - 如何匹配相同的字符对?

[英]grep regex - how do I match same character pairs?

Say I have the following string:假设我有以下字符串:

blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \

and I want to match the following 3 matches using grep:我想使用 grep 匹配以下 3 个匹配项:

\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\

and

\ \\ \\ \\ \

and

\ a\\b\\c\\ \

to do this I need a way to pair '\\' as to only end the match when there is a single closing '\' that isn't part of a pair.为此,我需要一种配对 '\\' 的方法,以便仅在有一个不属于配对的单个结束 '\' 时才结束比赛。

so far I have this:到目前为止,我有这个:

echo $string | grep -oP '\\((?!\\).)*\\'

edit: I managed to get it working in the regex101 environment:编辑:我设法让它在 regex101 环境中工作:

\\((?!\\).|(([\\]{2})+))+\\

https://regex101.com/r/wC2cF1/13 https://regex101.com/r/wC2cF1/13

but it's still giving me the same result in grep perl但它仍然给我同样的结果 grep perl

Use利用

text='blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \'
echo "$text" | grep -oE '\\([^\\]|\\\\)+\\'

Output: Output:

\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\
\ \\ \\ \\ \
\ a\\b\\c\\ \

If you have GNU grep then @RyszardCzech's answer is a good solution, otherwise using any awk in any shell on every UNIX box: If you have GNU grep then @RyszardCzech's answer is a good solution, otherwise using any awk in any shell on every UNIX box:

$ cat tst.awk
{
    gsub(/\\\\/,RS)
    while ( match($0,/\\[^\\]*\\/) ) {
        tgt = substr($0,RSTART,RLENGTH)
        gsub(RS,"\\\\",tgt)
        print tgt
        $0 = substr($0,RSTART+RLENGTH)
    }
}

. .

$ awk -f tst.awk file
\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\
\ \\ \\ \\ \
\ a\\b\\c\\ \

Using the core Text::Balanced module to extract the string:使用核心Text::Balanced模块提取字符串:

$ perl -MText::Balanced=extract_delimited -nE '$text = extract_delimited($_, q/\\/, qr/^[^\\]*/, q/\\/); say $text' input.txt
\the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\

Note: This solution is simpler and better than the answer below.注意:解决方案比下面的答案更简单更好。 But beware that its behaviour is different on a string \\\xy\ , for example.但请注意,例如,它在字符串\\\xy\上的行为是不同的。


Using GNU utilities: 使用 GNU 实用程序:
 sed 's/\\\\/\x00/g' file | grep -ao '\\[^\\]*\\' | sed 's/\x00/\\\\/g'

  • The first sed replaces each double backslash ( \\ ) with a null character (highly unlikely to occur in the original data to be processed).第一个sed将每个双反斜杠 ( \\ ) 替换为 null 字符(极不可能出现在要处理的原始数据中)。
  • The grep captures and prints the characters between matching single backslashes ( \ ). grep捕获并打印匹配的单个反斜杠 ( \ ) 之间的字符。 The GNU specific -a option allows to process a binary file as if it were a text file since the stream may contain null characters at this point. GNU 特定的-a选项允许像处理文本文件一样处理二进制文件,因为此时 stream 可能包含 null 字符。 With the GNU specific -o option, grep prints only the matching parts of the line, each one on a separate output line.使用 GNU 特定的-o选项, grep仅打印该行的匹配部分,每个部分位于单独的 output 行上。
  • The last sed restores the double backslashes by replacing each null character with a \\ .最后一个sed通过用\\替换每个 null 字符来恢复双反斜杠。

Please notice that those are highly GNU specific.请注意,这些都是高度 GNU 特定的。


Test:测试:

 $ line='blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \' $ sed 's/\\\\/\x00/g' <<< "$line" | grep -ao '\\[^\\]*\\' | sed 's/\x00/\\\\/g' \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ \ \\ \\ \\ \ \ a\\b\\c\\ \

With echo grep and tail...带回声 grep 和尾...

string='blah blah blah \the rain in sp\\\\ain moves mainly\\ on the p\\lain\\\\\ blah blah blah \ \\ \\ \\ \ foobar \ a\\b\\c\\ \'

echo ${string} | grep -o -E "([ \]{1,2}[ a-z]{0,2}[ \]{0,2}){1,4}" | tail -n2 | grep -o -E "[abc \]{1,32}"

Puts out...发出...

 \ \\ \\ \\ \ 
 \ a\\b\\c\\ \

grep -E means: Using an extended regular expression grep -E表示:使用扩展的正则表达式

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 grep - 如何仅使用两个字符匹配正则表达式,但每个字符出现的次数相同? - grep - How would I match a regex using only two characters, but with each character occuring the same number of times? 如何使用正则表达式模式格式化grep的输出以在字符串和字符之间进行匹配 - How to format the output of a grep with a regex pattern to match between a string and character 正则表达式:如何在其他捕获字符之前匹配一个字符? - Regex: how do I match a character before other capture characters? 如何在正则表达式中替换后立即匹配第一个字符? - How do I match the first character right after a substitution in the regex? 如何在正则表达式中使用前瞻以匹配无字符? - How do I use look-ahead in regex to match on no character? 如何使用 sed/grep/regex 在第 3 次第 4 次出现字符后删除所有内容 - How do I delete everything after the 3rd 4rth occurrence of a character using sed/grep/regex 我如何将我的正则表达式改进为 grep 三级域,但最终没有额外的字符? - How do i improve my regex to grep third level domain but not extra character at last? 如何将正则表达式转换为 grep 格式 - How do I convert regex to grep format 如何将正则表达式中的方括号与grep匹配? - How can I match square bracket in regex with grep? 如何使用grep和regex匹配特定长度的单词? - How can I use grep and regex to match a word with specific length?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM