[英]Match expression across multiple lines in shell script
I wish to match a pattern across multiple lines in a shell script. 我希望在shell脚本中跨多行匹配一个模式。 My input is as:
我的输入是:
START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n1 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END
START <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
ID: n2 <some data including white spaces>
<some data including white spaces, can span across multiple lines, number of lines are variable>
END
I am trying to display the output using regex for a specific ID only (eg. n1 or n2). 我正在尝试仅使用正则表达式显示特定ID(例如n1或n2)的输出。 I tried
START(.|\\n)*ID: n1(.|\\n)*END
regex but it fetches the data of ID: n2 as well. 我尝试了
START(.|\\n)*ID: n1(.|\\n)*END
正则表达式,但它也获取了ID:n2的数据。 What changes should I make to the regex inorder to get data of only the specific ID? 我应该对正则表达式进行哪些更改才能仅获取特定ID的数据?
I am using cat inputfile | grep 'pattern' > outputfile
我正在使用
cat inputfile | grep 'pattern' > outputfile
cat inputfile | grep 'pattern' > outputfile
as the command. cat inputfile | grep 'pattern' > outputfile
作为命令。
The number of lines in each block as well as the number of lines between START
and ID: n1
, ID: n1
and END
can be variable and hence using head/tail is not a viable option. 每个块中的行数以及
START
和ID: n1
, ID: n1
和END
之间的行数是可变的,因此使用头/尾不是可行的选择。 Also, I would like to print the whole block from START to END when the ID is matched. 另外,当ID匹配时,我想从START到END打印整个块。
EDIT: I tried using an Online Regex Creator and it could successfully match the regex 编辑:我尝试使用在线正则表达式创建者 ,它可以成功匹配正则表达式
START[\\s\\S][^END]*ID: n1[\\s\\S][^END]*END
on my input file. 在我的输入文件上。
awk
in paragraph mode, using two successive newlines as record separator: 在段落模式下使用两个连续的换行符作为记录分隔符的
awk
:
awk -v RS='\n\n' '/ID: n1/' file.txt
Replace n1
with n2
, n3
... for others. 将
n1
替换为n2
, n3
...。
Example: 例:
$ cat file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END
START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END
START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END
$ awk -v RS='\n\n' '/ID: n1/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n1 <some data including white spaces>
<some data including white spaces>
END
$ awk -v RS='\n\n' '/ID: n2/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n2 <some data including white spaces>
<some data including white spaces>
END
$ awk -v RS='\n\n' '/ID: n3/' file.txt
START <some data including white spaces>
<some data including white spaces>
ID: n3 <some data including white spaces>
<some data including white spaces>
END
A GNU awk
or Mawk solution that can handle any number of lines, including empty ones, between paired START
and END
occurrences: 一个GNU
awk
或Mawk解决方案 ,可以在成对的START
和END
出现之间处理任意数量的行,包括空行:
awk -v id='n2' -v RS='(^|\n)START |\nEND' '
$0 ~ ("\nID: " id " ") { print "START " $0 "\nEND" }
' file
This solution uses a multi-character RS
value (that is also a regex), which is not supported in the POSIX spec . 此解决方案使用多字符
RS
值(也是regex), POSIX规范中不支持该值。 Both GNU awk
and Mawk (the default awk
on Ubuntu) support such values, however, whereas BSD/macOS awk
does not. 但是,GNU
awk
和Mawk (Ubuntu上的默认awk
)都支持这些值,而BSD / macOS awk
不支持。
-v id='n2'
passes ID value n2
as variable id
to Awk. -v id='n2'
将ID值n2
作为变量id
传递给Awk。
RS='(^|\\n)START |\\nEND'
breaks the input into records by (line-spanning) text between tokens START
at the start of the input / a line and token END
after a newline. RS='(^|\\n)START |\\nEND'
通过输入行之间的令牌START
/换行之间的行和令牌END
之间的文本(行跨)将输入分为记录(行)。
$0 ~ ("\\nID: " id " ")
matches each input record ( $0
) against a regex ( ~
) that matches the specified ID: a newline followed by ID:
, followed by the ID value of interest (stored in variable id
) and a space. $0 ~ ("\\nID: " id " ")
将每个输入记录( $0
)与正则表达式( ~
)匹配,该正则表达式与指定的ID匹配:换行符,后跟ID:
:,后跟感兴趣的ID值(存储在变量id
)和一个空格。
Note how string concatenation in Awk works by simply placing strings / variable references next to each other. 请注意,通过简单地将字符串/变量引用彼此相邻放置,Awk中的字符串连接是如何工作的。
In case of a match, print "START " $0 "\\nEND"
prints the input record at hand, bookended by the START
and END
tokens (which, as the input record separators, doesn't report as part of $0
). 如果匹配,则
print "START " $0 "\\nEND"
打印当前输入的记录,该记录由START
和END
标记预定(作为输入记录分隔符,不作为$0
一部分报告)。
If the lines between paired START
and END
occurrences are all nonempty (ie, contain at least 1 char., even if that char. is a space or tab), here's a POSIX-compliant awk
solution: 如果成对的
START
和END
事件之间的行都是非空的 (即,至少包含1个字符,即使该字符是空格或制表符),以下是POSIX兼容的awk
解决方案:
awk -v id='n2' -v RS= '$0 ~ ("\nID: " id " ")' file
Note that -v RS=
, ie, setting the input record separator ( RS
) to the empty string, is an awk
idiom that breaks the input into records by paragraphs (runs of nonempty lines). 请注意,
-v RS=
,即将输入记录分隔符( RS
)设置为空字符串,是一个awk
惯用语,它通过段落 (非空行的运行)将输入分成记录。
In awk
you can accumulate the text between your starting pattern and ending pattern and then test that buffer for your match: 在
awk
您可以在开始模式和结束模式之间累积文本,然后测试该缓冲区是否匹配:
cat inputfile | awk '/^START/ { buf=$0 "\n"; flag=1; next }
flag { buf=buf $0 "\n" }
/^END/ && flag { flag=0; if (buf ~ /ID: n1 |ID: n2 /) print buf }'
In Perl you can do: 在Perl中,您可以执行以下操作:
cat inputfile | perl -0777 -lne 'while (/(^START.*?^ID: (n\d+) .*?^END)/gms){
if ($2 eq "n1" || $2 eq "n2"){
print "$1\n\n";
}
}'
In either case, you may want to do awk '{script}' inputfile
or perl '{script}' inputfile
rather than using cat
无论哪种情况,您都可能想使用
awk '{script}' inputfile
或perl '{script}' inputfile
而不是使用cat
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.