简体   繁体   English

在shell脚本中跨多行匹配表达式

[英]Match expression across multiple lines in shell script

I wish to match a pattern across multiple lines in a shell script. 我希望在shell脚本中跨多行匹配一个模式。 My input is as: 我的输入是:

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n1 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n2 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END

I am trying to display the output using regex for a specific ID only (eg. n1 or n2). 我正在尝试仅使用正则表达式显示特定ID(例如n1或n2)的输出。 I tried START(.|\\n)*ID: n1(.|\\n)*END regex but it fetches the data of ID: n2 as well. 我尝试了START(.|\\n)*ID: n1(.|\\n)*END正则表达式,但它也获取了ID:n2的数据。 What changes should I make to the regex inorder to get data of only the specific ID? 我应该对正则表达式进行哪些更改才能仅获取特定ID的数据?

I am using cat inputfile | grep 'pattern' > outputfile 我正在使用cat inputfile | grep 'pattern' > outputfile cat inputfile | grep 'pattern' > outputfile as the command. cat inputfile | grep 'pattern' > outputfile作为命令。

The number of lines in each block as well as the number of lines between START and ID: n1 , ID: n1 and END can be variable and hence using head/tail is not a viable option. 每个块中的行数以及STARTID: n1ID: n1END之间的行数是可变的,因此使用头/尾不是可行的选择。 Also, I would like to print the whole block from START to END when the ID is matched. 另外,当ID匹配时,我想从START到END打印整个块。

EDIT: I tried using an Online Regex Creator and it could successfully match the regex 编辑:我尝试使用在线正则表达式创建者 ,它可以成功匹配正则表达式

START[\\s\\S][^END]*ID: n1[\\s\\S][^END]*END

on my input file. 在我的输入文件上。

awk in paragraph mode, using two successive newlines as record separator: 在段落模式下使用两个连续的换行符作为记录分隔符的awk

awk -v RS='\n\n' '/ID: n1/' file.txt

Replace n1 with n2 , n3 ... for others. n1替换为n2n3 ...。

Example: 例:

$ cat file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END

START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END

START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n1/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n2/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END


$ awk -v RS='\n\n' '/ID: n3/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END

A GNU awk or Mawk solution that can handle any number of lines, including empty ones, between paired START and END occurrences: 一个GNU awkMawk解决方案 ,可以在成对的STARTEND出现之间处理任意数量的行,包括空行:

awk -v id='n2' -v RS='(^|\n)START |\nEND' '
  $0 ~ ("\nID: " id " ") { print "START " $0 "\nEND" }
' file

This solution uses a multi-character RS value (that is also a regex), which is not supported in the POSIX spec . 此解决方案使用多字符RS值(也是regex), POSIX规范中不支持该值。 Both GNU awk and Mawk (the default awk on Ubuntu) support such values, however, whereas BSD/macOS awk does not. 但是,GNU awkMawk (Ubuntu上的默认awk )都支持这些值,而BSD / macOS awk不支持。

  • -v id='n2' passes ID value n2 as variable id to Awk. -v id='n2'将ID值n2作为变量id传递给Awk。

  • RS='(^|\\n)START |\\nEND' breaks the input into records by (line-spanning) text between tokens START at the start of the input / a line and token END after a newline. RS='(^|\\n)START |\\nEND'通过输入行之间的令牌START /换行之间的行和令牌END之间的文本(行跨)将输入分为记录(行)。

  • $0 ~ ("\\nID: " id " ") matches each input record ( $0 ) against a regex ( ~ ) that matches the specified ID: a newline followed by ID: , followed by the ID value of interest (stored in variable id ) and a space. $0 ~ ("\\nID: " id " ")将每个输入记录( $0 )与正则表达式( ~ )匹配,该正则表达式与指定的ID匹配:换行符,后跟ID: :,后跟感兴趣的ID值(存储在变量id )和一个空格。
    Note how string concatenation in Awk works by simply placing strings / variable references next to each other. 请注意,通过简单地将字符串/变量引用彼此相邻放置,Awk中的字符串连接是如何工作的。

  • In case of a match, print "START " $0 "\\nEND" prints the input record at hand, bookended by the START and END tokens (which, as the input record separators, doesn't report as part of $0 ). 如果匹配,则print "START " $0 "\\nEND"打印当前输入的记录,该记录由STARTEND标记预定(作为输入记录分隔符,不作为$0一部分报告)。


If the lines between paired START and END occurrences are all nonempty (ie, contain at least 1 char., even if that char. is a space or tab), here's a POSIX-compliant awk solution: 如果成对的STARTEND事件之间的行都是非空的 (即,至少包含1个字符,即使该字符是空格或制表符),以下是POSIX兼容的awk解决方案:

awk -v id='n2' -v RS= '$0 ~ ("\nID: " id " ")' file

Note that -v RS= , ie, setting the input record separator ( RS ) to the empty string, is an awk idiom that breaks the input into records by paragraphs (runs of nonempty lines). 请注意, -v RS= ,即将输入记录分隔符( RS )设置为空字符串,是一个awk惯用语,它通过段落 (非空行的运行)将输入分成记录。

In awk you can accumulate the text between your starting pattern and ending pattern and then test that buffer for your match: awk您可以在开始模式和结束模式之间累积文本,然后测试该缓冲区是否匹配:

cat inputfile | awk  '/^START/        { buf=$0 "\n"; flag=1; next } 
                      flag            { buf=buf $0 "\n" } 
                      /^END/ && flag  { flag=0; if (buf ~ /ID: n1 |ID: n2 /) print buf }'

In Perl you can do: 在Perl中,您可以执行以下操作:

cat inputfile | perl -0777 -lne 'while (/(^START.*?^ID: (n\d+) .*?^END)/gms){
    if ($2 eq "n1" || $2 eq "n2"){
        print "$1\n\n";
    }
}'

In either case, you may want to do awk '{script}' inputfile or perl '{script}' inputfile rather than using cat 无论哪种情况,您都可能想使用awk '{script}' inputfileperl '{script}' inputfile而不是使用cat

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM