简体   繁体   English

使用sed或awk删除两个模式之间的所有行(不包括模式)

[英]Delete all lines between two patterns (exclusive of the pattern) using sed or awk

I have a somewhat large output text file where I need to delete all lines between two patterns but retain the pattern match. 我有一个有点大的输出文本文件,我需要删除两个模式之间的所有行,但保持模式匹配。

The files look vaguely like the following output. 这些文件看起来像以下输出一样模糊。

 TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
         2  |  -.4324005   2.231387    -0.19   0.847    -4.836829    3.972028
         3  |   -.362762    1.97184    -0.18   0.854    -4.254882    3.529358
            |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
 TEST #2          
        coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
        coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
              |
         year |
           4  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
           5  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
              |
     idnumber |
           6  |  -.4324005   2.231387    -0.19   0.847    -4.836829    3.972028
           7  |   -.362762    1.97184    -0.18   0.854    -4.254882    3.529358
              |
        _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

I need to take the following output and delete all the lines between "year" and "_cons" but I need to retain the line starting with "_cons". 我需要采取以下输出并删除“year”和“_cons”之间的所有行,但我需要保留以“_cons”开头的行。 The desired output is like so: 所需的输出如下:

 TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
 TEST #2          
        coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
        coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
              |
         year |
        _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

I wrote the following script (under OS X): 我编写了以下脚本(在OS X下):

sed '/^ +year/,/^ +_cons/{/^ +year/!{/^ +_cons/!d}}' input.txt >output.txt

but I got the following error: 但我收到以下错误:

sed: 1: "/^ +year/,/^ +_cons/{/^ ...": extra characters at the end of d command

I'm not sure if this approach is even correct because I can't seem to get sed to execute. 我不确定这种方法是否正确,因为我似乎无法执行。 Is sed even appropriate here or should I use awk? 在这里甚至是合适的还是我应该使用awk?

One last note, I need this script to work on a relatively generic Unix install. 最后一点,我需要这个脚本来处理相对通用的Unix安装。 I have to send this to someone who must execute it under a very basic AIX (I think) install. 我必须将此发送给必须在非常基本的AIX(我认为)安装下执行它的人。 No perl, no python, and I can't do much troubleshooting on their install over email. 没有perl,没有python,我无法通过电子邮件对其安装进行太多故障排除。

This should work - 这应该工作 -

awk '/year/{print; getline; while($0!~/_cons/) {getline}}1' INPUT_FILE

or 要么

awk '/_cons/{print;f=0;next}/year/{f=1;print;next}f{next}1' INPUT_FILE

Following is the Output with your input-data file: 以下是输入数据文件的输出:

[jaypal:~/Temp] awk '/year/{print; getline; while($0!~/_cons/) {getline}}1' file
TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
 TEST #2          
        coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
        coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
              |
         year |
        _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

Test2: 测试2:

[jaypal:~/Temp] awk '/_cons/{print;f=0;next}/year/{f=1;print;next}f{next}1' file
TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
TEST #2          
      coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

Try adding a semicolon after d to indicate that the command has ended. 尝试在d后添加分号以指示命令已结束。 (GNU sed — the only sed I have handy to test with — doesn't require this, but maybe another sed would?) (GNU sed - 我唯一能用来测试的sed - 不需要这个,但也许是另一个sed会吗?)

Also, if you need to support multiple implementations of sed , then you can't use + to mean "one or more": it's not standard, and not all implementations support it. 此外,如果您需要支持sed多个实现,那么您不能使用+来表示“一个或多个”:它不是标准的,并非所有实现都支持它。 You can use \\{1,\\} , but that's pretty ugly . 您可以使用\\{1,\\} ,但这非常难看。 . . I'd just use * and tack on an extra copy. 我只是使用*并添加额外的副本。

So: 所以:

sed '/^ * year/,/^ * _cons/{/^ * year/!{/^ * _cons/!d;}}' input.txt >output.txt

(Tested, but only using GNU sed , not OS X, and certainly not AIX, sorry.) (经过测试,但只使用GNU sed ,而不是OS X,当然不是AIX,抱歉。)

This might work for you: 这可能对你有用:

 sed '/year/,/_cons/{//!d}' file

or: 要么:

 awk '/_cons/{p=0};!p;/year/{p=1}' file

You can do it visually. 你可以直观地做。 Just open the file with gVim , and run the command: 只需用gVim打开文件,然后运行命令:

:g/^\s*year/+1,/^\s*_cons/-1 d

Explanation: 说明:

  • g global command g全局命令
  • /^\\s*year/+1 match line bellow year /^\\s*year/+1匹配线以下year
  • /^\\s*_cons/-1 match line above _cons /^\\s*_cons/-1匹配_cons之上的_cons
  • d delete the range d删除范围

To summarize and generalize the two GNU sed solutions that work: 总结和概括两个有效的GNU sed解决方案:

sed '/BEGIN/,/END/{/BEGIN/!{/END/!d;}}' input.txt
sed '/BEGIN/,/END/{//!d}' input.txt

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 sed 删除两个匹配模式之间的所有行 - Using sed to delete all lines between two matching patterns 如何使用 awk/sed 在两个标记模式之间选择可能多次出现的行并删除这些行 - How to select lines between two marker patterns which may occur multiple times with awk/sed and delete those lines Awk或Sed:返回相同模式的两个实例之间的行 - Awk or Sed: Return lines between two instances of the same pattern awk/sed 在两个模式之间插入行 - awk/sed insert line between two patterns 选择两个模式之间的第一个匹配。如果使用sed / awk / grep找到第三个模式,则重新开始搜索 - Select first match between two patterns.Restart search if a 3rd pattern is found using sed/awk/grep 无法使用 AWK 读取两个模式之间的线条 - Can't read lines between two patterns using AWK 使用sed根据第三内部模式删除两个匹配项之间的行 - delete lines between two matches according third inner pattern using sed Awk / sed解决方案,用于替换/前置2种模式之间的行子集 - Awk/sed solution for replacing/prepending subset of lines in between 2 patterns 当一个模式是bash / awk中的变量时,如何在两个模式之间划线(动态正则表达式) - How to take lines between two patterns when one pattern is a variable in bash/awk (dynamic regexp) 使用正则表达式模式使用sed / awk从文件中删除多行 - Delete multiple lines from files using sed/awk using regex pattern
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM