简体   繁体   English

awk 打印行直到同一行的下一个匹配项

[英]awk print lines until next match on the same line

I have the following type of data file:我有以下类型的数据文件:

            0.033333  0.000000  0.000000
  -46.956  -46.956  -23.678  -23.677  -23.055  -23.054  -22.974  -22.974   -8.033   -8.032
   -7.375   -7.356   -7.182   -7.159   -6.695   -6.661   -6.628   -6.598   -4.477   -4.477
   -4.470   -4.462   -4.387   -4.380    3.799    3.800    5.939    5.960    6.116    6.117
    6.625    6.642    7.648    7.651    7.686    7.687    8.077    8.078    8.123    8.126
    8.478    8.497    8.550    8.552   11.625   11.626   12.652   12.653   12.722   12.726
   13.860   13.864   14.291   14.293   14.966   15.046   17.063   17.252   18.011   18.015
            0.016667  0.000000  0.000000
  -46.956  -46.956  -23.677  -23.677  -23.055  -23.054  -22.974  -22.974   -8.037   -8.036
   -7.371   -7.361   -7.177   -7.165   -6.686   -6.669   -6.620   -6.605   -4.476   -4.475
   -4.471   -4.465   -4.385   -4.382    3.811    3.812    5.942    5.952    6.115    6.115
    6.629    6.638    7.651    7.653    7.688    7.689    8.072    8.073    8.122    8.123
    8.491    8.501    8.556    8.556   11.612   11.612   12.665   12.665   12.730   12.733
   13.835   13.837   14.288   14.289   14.991   15.031   17.132   17.225   18.053   18.055
            0.000000  0.000000  0.000000
  -46.956  -46.956  -23.677  -23.677  -23.055  -23.055  -22.974  -22.974   -8.038   -8.038
   -7.366   -7.366   -7.172   -7.172   -6.678   -6.678   -6.613   -6.613   -4.475   -4.475
   -4.469   -4.469   -4.384   -4.384    3.816    3.816    5.946    5.946    6.115    6.115
    6.633    6.633    7.653    7.653    7.689    7.689    8.070    8.070    8.122    8.122
    8.498    8.498    8.558    8.558   11.607   11.607   12.668   12.668   12.735   12.735
   13.827   13.827   14.287   14.287   15.013   15.013   17.186   17.186   18.068   18.068

I need to change this to look like this:我需要将其更改为如下所示:

0.033333  0.000000  0.000000  -46.956  -46.956  -23.678  -23.677  -23.055  -23.054  -22.974  -22.974   -8.033   -8.032   -7.375   -7.356   -7.182   -7.159   -6.695   -6.661   -6.628   -6.598   -4.477   -4.477   -4.470   -4.462   -4.387   -4.380    3.799    3.800    5.939    5.960    6.116    6.117    6.625    6.642    7.648    7.651    7.686    7.687    8.077    8.078    8.123    8.126    8.478    8.497    8.550    8.552   11.625   11.626   12.652   12.653   12.722   12.726   13.860   13.864   14.291   14.293   14.966   15.046   17.063   17.252   18.011   18.015
0.016667  0.000000  0.000000  -46.956  -46.956  -23.677  -23.677  -23.055  -23.054  -22.974  -22.974   -8.037   -8.036   -7.371   -7.361   -7.177   -7.165   -6.686   -6.669   -6.620   -6.605   -4.476   -4.475   -4.471   -4.465   -4.385   -4.382    3.811    3.812    5.942    5.952    6.115    6.115    6.629    6.638    7.651    7.653    7.688    7.689    8.072    8.073    8.122    8.123    8.491    8.501    8.556    8.556   11.612   11.612   12.665   12.665   12.730   12.733   13.835   13.837   14.288   14.289   14.991   15.031   17.132   17.225   18.053   18.055
0.000000  0.000000  0.000000  -46.956  -46.956  -23.677  -23.677  -23.055  -23.055  -22.974  -22.974   -8.038   -8.038   -7.366   -7.366   -7.172   -7.172   -6.678   -6.678   -6.613   -6.613   -4.475   -4.475   -4.469   -4.469   -4.384   -4.384    3.816    3.816    5.946    5.946    6.115    6.115    6.633    6.633    7.653    7.653    7.689    7.689    8.070    8.070    8.122    8.122    8.498    8.498    8.558    8.558   11.607   11.607   12.668   12.668   12.735   12.735   13.827   13.827   14.287   14.287   15.013   15.013   17.186   17.186   18.068   18.068

Basically look for the lines with 3 fields only and from there start to remove the line break character until the next line with 3 fields.基本上只查找包含 3 个字段的行,然后从那里开始删除换行符,直到下一行包含 3 个字段。 Also I want to remove all the spaces at the beginning of the line with the 3 fields.我还想删除带有 3 个字段的行开头的所有空格。 Hope this is clearer from the above example.希望从上面的例子中可以更清楚地看出这一点。

I have tried the following code:我尝试了以下代码:

BEGIN {
    ORS=" ";
}
NF==3 {x=NR+6} (NR<=x) {print}

Trouble is that I get a completely different result.麻烦的是我得到了完全不同的结果。 I don't know how to add a \n character before the next pattern match.我不知道如何在下一个模式匹配之前添加一个\n字符。 So I get:所以我得到:

0.033333  0.000000  0.000000   -46.956  -46.956  -23.678  -23.677  -23.055  -23.054  -22.974  -22.974   -8.033   -8.032    -7.375   -7.356   -7.182   -7.159   -6.695   -6.661   -6.628   -6.598   -4.477   -4.477    -4.470   -4.462   -4.387   -4.380    3.799    3.800    5.939    5.960    6.116    6.117     6.625    6.642    7.648    7.651    7.686    7.687    8.077    8.078    8.123    8.126     8.478    8.497    8.550    8.552   11.625   11.626   12.652   12.653   12.722   12.726    13.860   13.864   14.291   14.293   14.966   15.046   17.063   17.252   18.011   18.015             0.016667  0.000000  0.000000   -46.956  -46.956  -23.677  -23.677  -23.055  -23.054  -22.974  -22.974   -8.037   -8.036    -7.371   -7.361   -7.177   -7.165   -6.686   -6.669   -6.620   -6.605   -4.476   -4.475    -4.471   -4.465   -4.385   -4.382    3.811    3.812    5.942    5.952    6.115    6.115     6.629    6.638    7.651    7.653    7.688    7.689    8.072    8.073    8.122    8.123     8.491    8.501    8.556    8.556   11.612   11.612   12.665   12.665   12.730   12.733    13.835   13.837   14.288   14.289   14.991   15.031   17.132   17.225   18.053   18.055             0.000000  0.000000  0.000000   -46.956  -46.956  -23.677  -23.677  -23.055  -23.055  -22.974  -22.974   -8.038   -8.038    -7.366   -7.366   -7.172   -7.172   -6.678   -6.678   -6.613   -6.613   -4.475   -4.475    -4.469   -4.469   -4.384   -4.384    3.816    3.816    5.946    5.946    6.115    6.115     6.633    6.633    7.653    7.653    7.689    7.689    8.070    8.070    8.122    8.122     8.498    8.498    8.558    8.558   11.607   11.607   12.668   12.668   12.735   12.735    13.827   13.827   14.287   14.287   15.013   15.013   17.186   17.186   18.068 

I also don't know how to get rid of all the space characters on the line with the pattern match.我也不知道如何摆脱模式匹配行中的所有空格字符。

One awk idea:一个awk想法:

awk '
NF==3 { sub(/^[[:space:]]+/,"")      # remove leading white space
        printf "%s%s",eol,$0         # initially eol="" (undefined)
        eol="\n"                     # next time print this line with a leading "\n" (to close out previous line) 
        next}
      { printf "%s%s",OFS,$0 }       # OP will need to decide if the extra OFS is needed here or can be removed
END   { print "" }                   # terminate last line of output with a "\n"
' file

This generates:这会产生:

0.033333  0.000000  0.000000   -46.956  -46.956  -23.678  -23.677  -23.055  -23.054  -22.974  -22.974   -8.033   -8.032    -7.375   -7.356   -7.182   -7.159   -6.695   -6.661   -6.628   -6.598   -4.477   -4.477    -4.470   -4.462   -4.387   -4.380    3.799    3.800    5.939    5.960    6.116    6.117     6.625    6.642    7.648    7.651    7.686    7.687    8.077    8.078    8.123    8.126     8.478    8.497    8.550    8.552   11.625   11.626   12.652   12.653   12.722   12.726    13.860   13.864   14.291   14.293   14.966   15.046   17.063   17.252   18.011   18.015
0.016667  0.000000  0.000000   -46.956  -46.956  -23.677  -23.677  -23.055  -23.054  -22.974  -22.974   -8.037   -8.036    -7.371   -7.361   -7.177   -7.165   -6.686   -6.669   -6.620   -6.605   -4.476   -4.475    -4.471   -4.465   -4.385   -4.382    3.811    3.812    5.942    5.952    6.115    6.115     6.629    6.638    7.651    7.653    7.688    7.689    8.072    8.073    8.122    8.123     8.491    8.501    8.556    8.556   11.612   11.612   12.665   12.665   12.730   12.733    13.835   13.837   14.288   14.289   14.991   15.031   17.132   17.225   18.053   18.055
0.000000  0.000000  0.000000   -46.956  -46.956  -23.677  -23.677  -23.055  -23.055  -22.974  -22.974   -8.038   -8.038    -7.366   -7.366   -7.172   -7.172   -6.678   -6.678   -6.613   -6.613   -4.475   -4.475    -4.469   -4.469   -4.384   -4.384    3.816    3.816    5.946    5.946    6.115    6.115     6.633    6.633    7.653    7.653    7.689    7.689    8.070    8.070    8.122    8.122     8.498    8.498    8.558    8.558   11.607   11.607   12.668   12.668   12.735   12.735    13.827   13.827   14.287   14.287   15.013   15.013   17.186   17.186   18.068   18.068
awk -v ORS= '
    NF==3 {
        if (NR>1) print "\n"
        sub(/^[[:space:]]*/,"")
    }
    1;
    END { print "\n" }
' file
  • unset default newline for print ( OFS= )取消设置打印的默认换行符( OFS=
  • when 3-field line detected当检测到 3 场线时
    • print a newline (unless this is first line)打印换行符(除非这是第一行)
    • strip leading whitespace带前导空格
  • default print ( 1; ) - with no trailing newline默认打印 ( 1; ) - 没有尾随换行符
  • print final newline at the end最后打印最后的换行符

This code assumes all lines have leading whitespace (as shown in the sample input), so that no field separator is needed on joined lines.此代码假定所有行都有前导空格(如示例输入中所示),因此连接行上不需要字段分隔符。


Your original code is actually not far from working:您的原始代码实际上离工作不远了:

awk '
    BEGIN { ORS=" " } # or maybe ORS=""
    NF==3 {
        sub(/^[[:space:]]*/,"") # strip leading whitespace
        x = NR+6
    }
    NR<=x { print }
    NR==x { printf "\n" }
' file

An even simpler solution if we know that the 3-field lines always have much more leading whitespace than any other line (eg 8 or more):如果我们知道 3 字段行总是比任何其他行(例如 8 个或更多)有更多的前导空白,则一个更简单的解决方案:

awk -v RS='[[:space:]]{8,}' 'gsub(/\n/,"")' file
  • set input record separator to be lots of spaces将输入记录分隔符设置为大量空格
  • strip all embedded newlines去除所有嵌入的换行符
  • implicit print will append a trailing newline隐式打印将 append 尾随换行符

Note that the first (empty) record is conveniently elided because gsub fails (no newlines removed) and so does not trigger the implicit print.请注意,第一个(空)记录被方便地省略了,因为 gsub 失败(没有删除换行符),因此不会触发隐式打印。

Another note: This requires a version of awk that supports multi-character RS (eg gawk , busybox ; but not mawk , original-awk ).另一个注意事项:这需要支持多字符RS的 awk 版本(例如gawkbusybox ;但不mawkoriginal-awk )。

Final note: This method, while shorter code, appears to run significantly more slowly (about 10% of the speed of the first version).最后说明:此方法虽然代码较短,但运行速度明显较慢(约为第一个版本速度的 10%)。


For super -slow (about 1% the speed of the first awk version), and if squeezing whitespace is not a problem, there is also the extremely compact:对于超级慢(大约是第一个 awk 版本速度的 1%),如果压缩空格不是问题,还有极其紧凑的:

<file xargs -n63

Since you always have 7 lines per record, all you need is this, using GNU awk for multi-char RS and RT:由于每条记录始终有 7 行,因此您只需要使用 GNU awk 进行多字符 RS 和 RT:

$ awk -v RS='([^\n]+\n){7}' -v ORS= '{$0=RT; $1=$1} 1' file
0.033333 0.000000 0.000000 -46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032 -7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477 -4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117 6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126 8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726 13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015
0.016667 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036 -7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475 -4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115 6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123 8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733 13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055
0.000000 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038 -7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475 -4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115 6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122 8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735 13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068 18.06

or this using any awk:或者使用任何 awk:

$ awk '{rec=rec FS $0} !(NR%7){$0=rec; rec=""; $1=$1; print}' file
0.033333 0.000000 0.000000 -46.956 -46.956 -23.678 -23.677 -23.055 -23.054 -22.974 -22.974 -8.033 -8.032 -7.375 -7.356 -7.182 -7.159 -6.695 -6.661 -6.628 -6.598 -4.477 -4.477 -4.470 -4.462 -4.387 -4.380 3.799 3.800 5.939 5.960 6.116 6.117 6.625 6.642 7.648 7.651 7.686 7.687 8.077 8.078 8.123 8.126 8.478 8.497 8.550 8.552 11.625 11.626 12.652 12.653 12.722 12.726 13.860 13.864 14.291 14.293 14.966 15.046 17.063 17.252 18.011 18.015
0.016667 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.054 -22.974 -22.974 -8.037 -8.036 -7.371 -7.361 -7.177 -7.165 -6.686 -6.669 -6.620 -6.605 -4.476 -4.475 -4.471 -4.465 -4.385 -4.382 3.811 3.812 5.942 5.952 6.115 6.115 6.629 6.638 7.651 7.653 7.688 7.689 8.072 8.073 8.122 8.123 8.491 8.501 8.556 8.556 11.612 11.612 12.665 12.665 12.730 12.733 13.835 13.837 14.288 14.289 14.991 15.031 17.132 17.225 18.053 18.055
0.000000 0.000000 0.000000 -46.956 -46.956 -23.677 -23.677 -23.055 -23.055 -22.974 -22.974 -8.038 -8.038 -7.366 -7.366 -7.172 -7.172 -6.678 -6.678 -6.613 -6.613 -4.475 -4.475 -4.469 -4.469 -4.384 -4.384 3.816 3.816 5.946 5.946 6.115 6.115 6.633 6.633 7.653 7.653 7.689 7.689 8.070 8.070 8.122 8.122 8.498 8.498 8.558 8.558 11.607 11.607 12.668 12.668 12.735 12.735 13.827 13.827 14.287 14.287 15.013 15.013 17.186 17.186 18.068 18.06

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM