[英]awk print lines until next match on the same line
I have the following type of data file:我有以下类型的数据文件:
0.033333 0.000000 0.000000
-46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032
-7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477
-4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117
6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126
8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726
13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015
0.016667 0.000000 0.000000
-46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036
-7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475
-4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115
6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123
8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733
13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055
0.000000 0.000000 0.000000
-46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038
-7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475
-4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115
6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122
8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735
13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068 18.068
I need to change this to look like this:我需要将其更改为如下所示:
0.033333 0.000000 0.000000 -46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032 -7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477 -4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117 6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126 8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726 13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015
0.016667 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036 -7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475 -4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115 6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123 8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733 13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055
0.000000 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038 -7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475 -4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115 6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122 8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735 13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068 18.068
Basically look for the lines with 3 fields only and from there start to remove the line break character until the next line with 3 fields.基本上只查找包含 3 个字段的行,然后从那里开始删除换行符,直到下一行包含 3 个字段。 Also I want to remove all the spaces at the beginning of the line with the 3 fields.我还想删除带有 3 个字段的行开头的所有空格。 Hope this is clearer from the above example.希望从上面的例子中可以更清楚地看出这一点。
I have tried the following code:我尝试了以下代码:
BEGIN {
ORS=" ";
}
NF==3 {x=NR+6} (NR<=x) {print}
Trouble is that I get a completely different result.麻烦的是我得到了完全不同的结果。 I don't know how to add a \n
character before the next pattern match.我不知道如何在下一个模式匹配之前添加一个\n
字符。 So I get:所以我得到:
0.033333 0.000000 0.000000 -46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032 -7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477 -4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117 6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126 8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726 13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015 0.016667 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036 -7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475 -4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115 6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123 8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733 13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055 0.000000 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038 -7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475 -4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115 6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122 8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735 13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068
I also don't know how to get rid of all the space characters on the line with the pattern match.我也不知道如何摆脱模式匹配行中的所有空格字符。
One awk
idea:一个awk
想法:
awk '
NF==3 { sub(/^[[:space:]]+/,"") # remove leading white space
printf "%s%s",eol,$0 # initially eol="" (undefined)
eol="\n" # next time print this line with a leading "\n" (to close out previous line)
next}
{ printf "%s%s",OFS,$0 } # OP will need to decide if the extra OFS is needed here or can be removed
END { print "" } # terminate last line of output with a "\n"
' file
This generates:这会产生:
0.033333 0.000000 0.000000 -46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032 -7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477 -4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117 6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126 8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726 13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015
0.016667 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036 -7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475 -4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115 6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123 8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733 13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055
0.000000 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038 -7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475 -4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115 6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122 8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735 13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068 18.068
awk -v ORS= '
NF==3 {
if (NR>1) print "\n"
sub(/^[[:space:]]*/,"")
}
1;
END { print "\n" }
' file
OFS=
)取消设置打印的默认换行符( OFS=
)1;
) - with no trailing newline默认打印 ( 1;
) - 没有尾随换行符This code assumes all lines have leading whitespace (as shown in the sample input), so that no field separator is needed on joined lines.此代码假定所有行都有前导空格(如示例输入中所示),因此连接行上不需要字段分隔符。
Your original code is actually not far from working:您的原始代码实际上离工作不远了:
awk '
BEGIN { ORS=" " } # or maybe ORS=""
NF==3 {
sub(/^[[:space:]]*/,"") # strip leading whitespace
x = NR+6
}
NR<=x { print }
NR==x { printf "\n" }
' file
An even simpler solution if we know that the 3-field lines always have much more leading whitespace than any other line (eg 8 or more):如果我们知道 3 字段行总是比任何其他行(例如 8 个或更多)有更多的前导空白,则一个更简单的解决方案:
awk -v RS='[[:space:]]{8,}' 'gsub(/\n/,"")' file
Note that the first (empty) record is conveniently elided because gsub fails (no newlines removed) and so does not trigger the implicit print.请注意,第一个(空)记录被方便地省略了,因为 gsub 失败(没有删除换行符),因此不会触发隐式打印。
Another note: This requires a version of awk that supports multi-character RS
(eg gawk
, busybox
; but not mawk
, original-awk
).另一个注意事项:这需要支持多字符RS
的 awk 版本(例如gawk
、 busybox
;但不mawk
、 original-awk
)。
Final note: This method, while shorter code, appears to run significantly more slowly (about 10% of the speed of the first version).最后说明:此方法虽然代码较短,但运行速度明显较慢(约为第一个版本速度的 10%)。
For super -slow (about 1% the speed of the first awk version), and if squeezing whitespace is not a problem, there is also the extremely compact:对于超级慢(大约是第一个 awk 版本速度的 1%),如果压缩空格不是问题,还有极其紧凑的:
<file xargs -n63
Since you always have 7 lines per record, all you need is this, using GNU awk for multi-char RS and RT:由于每条记录始终有 7 行,因此您只需要使用 GNU awk 进行多字符 RS 和 RT:
$ awk -v RS='([^\n]+\n){7}' -v ORS= '{$0=RT; $1=$1} 1' file
0.033333 0.000000 0.000000 -46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032 -7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477 -4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117 6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126 8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726 13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015
0.016667 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036 -7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475 -4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115 6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123 8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733 13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055
0.000000 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038 -7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475 -4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115 6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122 8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735 13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068 18.06
or this using any awk:或者使用任何 awk:
$ awk '{rec=rec FS $0} !(NR%7){$0=rec; rec=""; $1=$1; print}' file
0.033333 0.000000 0.000000 -46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032 -7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477 -4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117 6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126 8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726 13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015
0.016667 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036 -7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475 -4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115 6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123 8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733 13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055
0.000000 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038 -7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475 -4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115 6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122 8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735 13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068 18.06
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.