[英]How can I match only the first N instances of a pattern, then print lines following each pattern until a blank line?
I have a log file summarising calculation results that I need to prepare for analysis. 我有一个日志文件,总结了我需要准备分析的计算结果。 Each result is given a heading, of the form:
每个结果都有一个标题,形式如下:
Excited State 1: Triplet-A 3.1118 eV 398.43 nm f=0.0000"
Followed by an unknown number of data lines of the form: 其次是表格数量未知的数据:
"76 -> 81 0.36917"
(an integer, an arrow, another integer, then a float). (整数,箭头,另一个整数,然后是浮点数)。 Each result is separated from the next result by a blank line.
每个结果通过空行与下一个结果分开。 I want to be able to take the first two sets (including the data lines) of results where the heading contains the pattern "Triplet".
我希望能够获得结果的前两组(包括数据行),其中标题包含模式“Triplet”。 Later, I need to be able to do the same for the "Singlet" pattern, so I can't just delete those.
后来,我需要能够为“Singlet”模式做同样的事情,所以我不能删除它们。
Unfortunately, it is important for later analysis that the data lines be kept separated in some way, as I will need to order the data lines in decreasing order of magnitude (by the float column). 不幸的是,对于以后的分析来说,重要的是数据线应该以某种方式保持分离,因为我需要按照数量级的递减顺序排列数据线(通过浮点列)。
I have been able to use sed to return all instances of the Triplet headings and following data lines (until the blank line), as follows: 我已经能够使用sed返回Triplet标题的所有实例并跟随数据行(直到空白行),如下所示:
sed -n '/Triplet/,/^ *$/p' test.txt
sed -n'/ Triplet /,/ ^ * $ / p'test.txt
But I don't know how to get only the first two instances. 但我不知道如何只获得前两个实例。
Ideally, if the input file looks like the following: 理想情况下,如果输入文件如下所示:
Excited State 1: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
Excited State 2: Singlet-A 3.3656 eV 379.43 nm f=0.0029
76 -> 81 0.38068
76 ->101 0.10777
...
Excited State 3: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
...
I'd like to be able to get: 我希望能得到:
Excited State 1: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
Excited State 3: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
And while, in this case, I could just remove the second data set, that won't generalise. 虽然在这种情况下,我可以删除第二个数据集,但不会概括。
$ awk '/Triplet/ { n += 1 } n <= 2 && /Triplet/,/^ *$/' input.txt
Excited State 1: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
Excited State 3: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
...
A gnu awk
version (gnu due to RS with multiple characters) 一个
gnu awk
版本(由于RS有多个字符的gnu)
awk -v RS='Excited State' '/Triplet/ {if (n++<2) printf "%s",RS$0}' file
Excited State 1: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
Excited State 3: Triplet-A 3.1118 eV 398.43 nm f=0.0000
76 -> 81 0.36917
76 ->101 0.11911
...
...
RS='Excited State'
set record selector to Excited State
so awk
works in block mode RS='Excited State'
将记录选择器设置为Excited State
因此awk
在块模式下工作 /Triplet/
test if line contains Triplet
if so: /Triplet/
test如果行包含Triplet
如果是这样: if (n++<2)
test if counter is less then two starting by zero to get two block only, then: if (n++<2)
测试计数器是否小于2,则从零开始只得到两个块,然后: print RS$0
print record selector and block print RS$0
打印记录选择器和块 PS this will work even if blank line is missing between blocks 即使块之间缺少空白行,PS也能正常工作
This might work for you (GNU sed): 这可能适合你(GNU sed):
sed -E '/Triplet/{x;s/^/x/;/^x{1,2}$/{x;:a;n;/\S/ba;p;x};x};d' file
Focus on a line containing Triplet
and after incrementing a counter in the hold space, determine if to print that line upto and including an empty one. 将焦点放在包含
Triplet
的行上,并在保持空间中递增计数器后,确定是否打印该行并包括空行。
如果所有记录之间都有空行,则可以轻松执行以下操作:
$ awk 'BEGIN{RS="";FS=OFS="\n";n=2}($1~/Triplet/ && n-->0);(n==0){exit}' file
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.