简体   繁体   中英

awk to print 3 lines above match until second match

I have a long list of lines and I want to selectively print START + 3 lines above until END including. The problem is that the length between START and END is variable, but I always want the three lines above start.
I tried awk :

awk '/START/,/END/' file.txt

However I cant find out a way on how to include the three lines above START. Any hint would really be appreciated thanks!

Input

EFA  
DAD  
ABC  
DEF  
GEF  
START  
EDG  
EFG  
GAD  
END  
CDA  

Result

ABC  
DEF  
GEF  
START  
EDG  
EFG  
GAD  
END  
awk '/START/ { if (a) print a; if (b) print b; if (c) print c; }\
     { a=b; b=c; c=$0; }\
     /START/,/END/' file.txt

Explanation

/START/{if(a)print a;if(b)print b;if(c)print c} when a line matching /START/ is encountered print the buffer records, skipping any that are empty.

{a=b;b=c;c=$0} shift buffer records, if many more are needed than an array can be used.

/START/,/END/ print all records between /START/ and /END/

#!awk -f
{
  foo[NR] = $0
}
/START/ {
  bar = NR - 3
}
/END/ {
  while (bar++ <= NR)
    print foo[bar]
}
awk '/START/{print x3"\n"x2"\n"x;p=1}
     /END/{print;p=0}
     {x3=x2}
     {x2=x}
     {x=$0}p' your_file

Tested:

> cat temp
EFA  
DAD  
ABC  
DEF  
GEF  
START  
EDG  
EFG  
GAD  
END  
CDA  
> awk '/START/{print x3"\n"x2"\n"x;p=1}/END/{print;p=0}{x3=x2}{x2=x}{x=$0}p' temp
ABC  
DEF  
GEF  
START  
EDG  
EFG  
GAD  
END  
> 

A similar, but maybe an easier-to-understand variation for the same theme:

awk '/START/{for(i=1;i<4;++i)if(NR-i in a)print a[NR-i]}{a[NR]=$0;delete a[NR-3]}/START/,/END/' inputfile

In the middle it just stores the last three lines and drops if there is a fourth one. If the string START is found, it prints the three previous lines (only if they exists) and anything between START and END .

If START and END should be exact, then the pattern should be /^START$/ and /^END$/ or instead of pattern matching a direct string comparison should be used like $0=="START" in all cases.

Input file:

GEF  
START  
EDG  
EFG  
GAD  
END  
CDA
EFA  
DAD  
ABC  
DEF  
GEF  
START  
EDG  
EFG  
GAD  
END  
CDA  

Output:

GEF  
START  
EDG  
EFG  
GAD  
END  
GEF  
DEF  
ABC  
START  
EDG  
EFG  
GAD  
END

One possible solution to one possible interpretation of your requirements:

$ awk '{a[NR]=$0} /START/{s=NR} /END/{for (i=(s-3);i<=NR;i++) print a[i]}' file
ABC
DEF
GEF
START
EDG
EFG
GAD
END

Will work if there's 1 or more START/END blocks and you do not want first START to last END.

If START and END appear just once, you can use grep with context like this:

grep -B 3 -A 99999 START file | grep -B 99999 END

ie 3 lines before START and up to 99999 lines after, then up to 99999 lines before END.

Using tac

Should work if multiple END/STARTS in the file

tac file | awk '/END/{x=4}y&&x{x--}/START/{y=x}x' | tac

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM