在shell脚本中跨多行匹配表达式

Question

I wish to match a pattern across multiple lines in a shell script. 我希望在shell脚本中跨多行匹配一个模式。 My input is as: 我的输入是：

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n1 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n2 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

I am trying to display the output using regex for a specific ID only (eg. n1 or n2). 我正在尝试仅使用正则表达式显示特定ID（例如n1或n2）的输出。 I tried START(.|\\n)*ID: n1(.|\\n)*END regex but it fetches the data of ID: n2 as well. 我尝试了START(.|\\n)*ID: n1(.|\\n)*END正则表达式，但它也获取了ID：n2的数据。 What changes should I make to the regex inorder to get data of only the specific ID? 我应该对正则表达式进行哪些更改才能仅获取特定ID的数据？

I am using cat inputfile | grep 'pattern' > outputfile 我正在使用cat inputfile | grep 'pattern' > outputfile cat inputfile | grep 'pattern' > outputfile as the command. cat inputfile | grep 'pattern' > outputfile作为命令。

The number of lines in each block as well as the number of lines between START and ID: n1 , ID: n1 and END can be variable and hence using head/tail is not a viable option. 每个块中的行数以及START和ID: n1 ， ID: n1和END之间的行数是可变的，因此使用头/尾不是可行的选择。 Also, I would like to print the whole block from START to END when the ID is matched. 另外，当ID匹配时，我想从START到END打印整个块。

EDIT: I tried using an Online Regex Creator and it could successfully match the regex 编辑：我尝试使用在线正则表达式创建者，它可以成功匹配正则表达式

START[\\s\\S][^END]*ID: n1[\\s\\S][^END]*END

on my input file. 在我的输入文件上。

Answer 1

awk in paragraph mode, using two successive newlines as record separator: 在段落模式下使用两个连续的换行符作为记录分隔符的awk ：

awk -v RS='\n\n' '/ID: n1/' file.txt

Replace n1 with n2 , n3 ... for others. 将n1替换为n2 ， n3 ...。

Example: 例：

$ cat file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END

START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END

START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n1/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n2/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n3/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END

Answer 2

A GNU awk or Mawk solution that can handle any number of lines, including empty ones, between paired START and END occurrences: 一个GNU awk或Mawk解决方案，可以在成对的START和END出现之间处理任意数量的行，包括空行：

awk -v id='n2' -v RS='(^|\n)START |\nEND' '
  $0 ~ ("\nID: " id " ") { print "START " $0 "\nEND" }
' file

^{This solution uses a multi-character RS value (that is also a regex), which is not supported in the POSIX spec .} ^{此解决方案使用多字符RS值（也是regex）， POSIX规范中不支持该值。} ^{Both GNU awk and Mawk (the default awk on Ubuntu) support such values, however, whereas BSD/macOS awk does not.} ^{但是，GNU awk和Mawk （Ubuntu上的默认awk ）都支持这些值，而BSD / macOS awk不支持。}

-v id='n2' passes ID value n2 as variable id to Awk. -v id='n2'将ID值n2作为变量id传递给Awk。
RS='(^|\\n)START |\\nEND' breaks the input into records by (line-spanning) text between tokens START at the start of the input / a line and token END after a newline. RS='(^|\\n)START |\\nEND'通过输入行之间的令牌START /换行之间的行和令牌END之间的文本（行跨）将输入分为记录（行）。
$0 ~ ("\\nID: " id " ") matches each input record ( $0 ) against a regex ( ~ ) that matches the specified ID: a newline followed by ID: , followed by the ID value of interest (stored in variable id ) and a space. $0 ~ ("\\nID: " id " ")将每个输入记录（ $0 ）与正则表达式（ ~ ）匹配，该正则表达式与指定的ID匹配：换行符，后跟ID: ：，后跟感兴趣的ID值（存储在变量id ）和一个空格。
Note how string concatenation in Awk works by simply placing strings / variable references next to each other. 请注意，通过简单地将字符串/变量引用彼此相邻放置，Awk中的字符串连接是如何工作的。
In case of a match, print "START " $0 "\\nEND" prints the input record at hand, bookended by the START and END tokens (which, as the input record separators, doesn't report as part of $0 ). 如果匹配，则print "START " $0 "\\nEND"打印当前输入的记录，该记录由START和END标记预定（作为输入记录分隔符，不作为$0一部分报告）。

If the lines between paired START and END occurrences are all nonempty (ie, contain at least 1 char., even if that char. is a space or tab), here's a POSIX-compliant awk solution: 如果成对的START和END事件之间的行都是非空的 （即，至少包含1个字符，即使该字符是空格或制表符），以下是POSIX兼容的awk解决方案：

awk -v id='n2' -v RS= '$0 ~ ("\nID: " id " ")' file

Note that -v RS= , ie, setting the input record separator ( RS ) to the empty string, is an awk idiom that breaks the input into records by paragraphs (runs of nonempty lines). 请注意， -v RS= ，即将输入记录分隔符（ RS ）设置为空字符串，是一个awk惯用语，它通过段落（非空行的运行）将输入分成记录。

Answer 3

In awk you can accumulate the text between your starting pattern and ending pattern and then test that buffer for your match: 在awk您可以在开始模式和结束模式之间累积文本，然后测试该缓冲区是否匹配：

cat inputfile | awk  '/^START/        { buf=$0 "\n"; flag=1; next } 
                      flag            { buf=buf $0 "\n" } 
                      /^END/ && flag  { flag=0; if (buf ~ /ID: n1 |ID: n2 /) print buf }'

In Perl you can do: 在Perl中，您可以执行以下操作：

cat inputfile | perl -0777 -lne 'while (/(^START.*?^ID: (n\d+) .*?^END)/gms){
    if ($2 eq "n1" || $2 eq "n2"){
        print "$1\n\n";
    }
}'

In either case, you may want to do awk '{script}' inputfile or perl '{script}' inputfile rather than using cat 无论哪种情况，您都可能想使用awk '{script}' inputfile或perl '{script}' inputfile而不是使用cat

在shell脚本中跨多行匹配表达式

问题描述

3 个解决方案

解决方案1
1 2017-01-22 05:41:05

解决方案2
1 已采纳 2017-01-22 18:48:10

解决方案3
0 2017-01-22 06:33:33

在shell脚本中跨多行匹配表达式

问题描述

3 个解决方案

解决方案1 1 2017-01-22 05:41:05

解决方案2 1 已采纳 2017-01-22 18:48:10

解决方案3 0 2017-01-22 06:33:33

解决方案1
1 2017-01-22 05:41:05

解决方案2
1 已采纳 2017-01-22 18:48:10

解决方案3
0 2017-01-22 06:33:33