简体   繁体   English

如何仅匹配模式的前N个实例,然后在每个模式后面打印行直到空白行?

[英]How can I match only the first N instances of a pattern, then print lines following each pattern until a blank line?

I have a log file summarising calculation results that I need to prepare for analysis. 我有一个日志文件,总结了我需要准备分析的计算结果。 Each result is given a heading, of the form: 每个结果都有一个标题,形式如下:

 Excited State   1:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000"

Followed by an unknown number of data lines of the form: 其次是表格数量未知的数据:

"76 -> 81  0.36917" 

(an integer, an arrow, another integer, then a float). (整数,箭头,另一个整数,然后是浮点数)。 Each result is separated from the next result by a blank line. 每个结果通过空行与下一个结果分开。 I want to be able to take the first two sets (including the data lines) of results where the heading contains the pattern "Triplet". 我希望能够获得结果的前两组(包括数据行),其中标题包含模式“Triplet”。 Later, I need to be able to do the same for the "Singlet" pattern, so I can't just delete those. 后来,我需要能够为“Singlet”模式做同样的事情,所以我不能删除它们。

Unfortunately, it is important for later analysis that the data lines be kept separated in some way, as I will need to order the data lines in decreasing order of magnitude (by the float column). 不幸的是,对于以后的分析来说,重要的是数据线应该以某种方式保持分离,因为我需要按照数量级的递减顺序排列数据线(通过浮点列)。

I have been able to use sed to return all instances of the Triplet headings and following data lines (until the blank line), as follows: 我已经能够使用sed返回Triplet标题的所有实例并跟随数据行(直到空白行),如下所示:

sed -n '/Triplet/,/^ *$/p' test.txt sed -n'/ Triplet /,/ ^ * $ / p'test.txt

But I don't know how to get only the first two instances. 但我不知道如何只获得前两个实例。

Ideally, if the input file looks like the following: 理想情况下,如果输入文件如下所示:

 Excited State   1:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...

Excited State   2:      Singlet-A      3.3656 eV  379.43 nm  f=0.0029
76 -> 81         0.38068
76 ->101         0.10777
...

Excited State   3:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...
...

I'd like to be able to get: 我希望能得到:

Excited State   1:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...

Excited State   3:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...

And while, in this case, I could just remove the second data set, that won't generalise. 虽然在这种情况下,我可以删除第二个数据集,但不会概括。

$ awk '/Triplet/ { n += 1 } n <= 2 && /Triplet/,/^ *$/' input.txt
 Excited State   1:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...

Excited State   3:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...
...

A gnu awk version (gnu due to RS with multiple characters) 一个gnu awk版本(由于RS有多个字符的gnu)

awk -v RS='Excited State' '/Triplet/ {if (n++<2) printf "%s",RS$0}' file
Excited State   1:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...

Excited State   3:      Triplet-A      3.1118 eV  398.43 nm  f=0.0000
76 -> 81         0.36917
76 ->101         0.11911
...
...
  • RS='Excited State' set record selector to Excited State so awk works in block mode RS='Excited State'将记录选择器设置为Excited State因此awk在块模式下工作
  • /Triplet/ test if line contains Triplet if so: /Triplet/ test如果行包含Triplet如果是这样:
    • if (n++<2) test if counter is less then two starting by zero to get two block only, then: if (n++<2)测试计数器是否小于2,则从零开始只得到两个块,然后:
      • print RS$0 print record selector and block print RS$0打印记录选择器和块

PS this will work even if blank line is missing between blocks 即使块之间缺少空白行,PS也能正常工作

This might work for you (GNU sed): 这可能适合你(GNU sed):

sed -E '/Triplet/{x;s/^/x/;/^x{1,2}$/{x;:a;n;/\S/ba;p;x};x};d' file

Focus on a line containing Triplet and after incrementing a counter in the hold space, determine if to print that line upto and including an empty one. 将焦点放在包含Triplet的行上,并在保持空间中递增计数器后,确定是否打印该行并包括空行。

如果所有记录之间都有空行,则可以轻松执行以下操作:

$ awk 'BEGIN{RS="";FS=OFS="\n";n=2}($1~/Triplet/ && n-->0);(n==0){exit}' file

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何仅在模式的第一次匹配后删除日志文件中的 n 行 - How to remove n lines in a log file only after the first match of the pattern 仅显示带有pattern1的行,紧跟另一pattern2的行 - only show lines with a pattern1 following a line with another pattern2 如何在BASH文件的每一行中查找和替换模式的第一个匹配项? - How to find and replace the first match of a pattern in each line of a file in BASH? 模式匹配后如何打印所有行 - How to print all lines after the pattern match 如何使用第一行的一个模式和所有后续行的另一个模式(最好使用sed)连接几条连续的行? - How to join several consecutive lines using one pattern for the first line and the other pattern for all following lines, preferably with sed? 如何在bash中匹配模式的零个或多个实例? - How can I match zero or more instances of a pattern in bash? 如果第二行包含与第一行相同的匹配,如何打印2行? - How can I print 2 lines if the second line contains the same match as the first line? 如何使用SED在文件的两个连续行中搜索两个不同的模式,并在模式匹配后打印下4行? - How can I search for two different patterns in two consecutive lines in a file using SED and print next 4 lines after pattern match? 如何在一个图案和以另一个图案开始的线条之间打印线条? - How to print lines between a pattern and a line starting with another pattern? 仅输出第一条图案线及其后一行 - Output only the first pattern-line and its following line
相关标签
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM