简体   繁体   中英

Delete all lines between two patterns (exclusive of the pattern) using sed or awk

I have a somewhat large output text file where I need to delete all lines between two patterns but retain the pattern match.

The files look vaguely like the following output.

 TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
         2  |  -.4324005   2.231387    -0.19   0.847    -4.836829    3.972028
         3  |   -.362762    1.97184    -0.18   0.854    -4.254882    3.529358
            |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
 TEST #2          
        coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
        coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
              |
         year |
           4  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
           5  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
              |
     idnumber |
           6  |  -.4324005   2.231387    -0.19   0.847    -4.836829    3.972028
           7  |   -.362762    1.97184    -0.18   0.854    -4.254882    3.529358
              |
        _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

I need to take the following output and delete all the lines between "year" and "_cons" but I need to retain the line starting with "_cons". The desired output is like so:

 TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
 TEST #2          
        coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
        coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
              |
         year |
        _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

I wrote the following script (under OS X):

sed '/^ +year/,/^ +_cons/{/^ +year/!{/^ +_cons/!d}}' input.txt >output.txt

but I got the following error:

sed: 1: "/^ +year/,/^ +_cons/{/^ ...": extra characters at the end of d command

I'm not sure if this approach is even correct because I can't seem to get sed to execute. Is sed even appropriate here or should I use awk?

One last note, I need this script to work on a relatively generic Unix install. I have to send this to someone who must execute it under a very basic AIX (I think) install. No perl, no python, and I can't do much troubleshooting on their install over email.

This should work -

awk '/year/{print; getline; while($0!~/_cons/) {getline}}1' INPUT_FILE

or

awk '/_cons/{print;f=0;next}/year/{f=1;print;next}f{next}1' INPUT_FILE

Following is the Output with your input-data file:

[jaypal:~/Temp] awk '/year/{print; getline; while($0!~/_cons/) {getline}}1' file
TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
 TEST #2          
        coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
        coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
              |
         year |
        _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

Test2:

[jaypal:~/Temp] awk '/_cons/{print;f=0;next}/year/{f=1;print;next}f{next}1' file
TEST #1          
      coef1 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef2 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
  indicator |
         0  |   .6647992   2.646627     0.25   0.802     -4.55925    5.888849
         1  |   2.118701   5.225777     0.41   0.686     -8.19621    12.43361
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869
TEST #2          
      coef2 |   48.36895    3.32013    14.57   0.000     41.86141    54.87649
      coef3 |  -50.08894   10.47335    -4.78   0.000    -70.61697   -29.56092
            |
       year |
      _cons |   16.95753   6.342342     2.67   0.008     4.526383    29.38869

Try adding a semicolon after d to indicate that the command has ended. (GNU sed — the only sed I have handy to test with — doesn't require this, but maybe another sed would?)

Also, if you need to support multiple implementations of sed , then you can't use + to mean "one or more": it's not standard, and not all implementations support it. You can use \\{1,\\} , but that's pretty ugly . . . I'd just use * and tack on an extra copy.

So:

sed '/^ * year/,/^ * _cons/{/^ * year/!{/^ * _cons/!d;}}' input.txt >output.txt

(Tested, but only using GNU sed , not OS X, and certainly not AIX, sorry.)

This might work for you:

 sed '/year/,/_cons/{//!d}' file

or:

 awk '/_cons/{p=0};!p;/year/{p=1}' file

You can do it visually. Just open the file with gVim , and run the command:

:g/^\s*year/+1,/^\s*_cons/-1 d

Explanation:

  • g global command
  • /^\\s*year/+1 match line bellow year
  • /^\\s*_cons/-1 match line above _cons
  • d delete the range

To summarize and generalize the two GNU sed solutions that work:

sed '/BEGIN/,/END/{/BEGIN/!{/END/!d;}}' input.txt
sed '/BEGIN/,/END/{//!d}' input.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM