简体   繁体   中英

Extract lines between two patterns by performing exact match for the 1st pattern only

I'm trying to extract from a large file lines located between two lines each of which is marked by a certain pattern, let's say pattern1 and pattern2. My code :

awk "/pattern1/{flag=1;next}/pattern2/{flag=0}flag" filename

verifies if "pattern1" exists in a line and start printing from that line until it finds a subsequent line in which the string "pattern2" exists.

What I would like to do is exactly matching the string "pattern1" with the line from which awk will begin printing, and detecting the line at which awk will stop printing by verifying if "pattern2" exists in the line (no exact matching). So basically, I would like to do exact matching for the first pattern and keep the matching behavior of the command above for the second pattern.

awk has that functionality builtin like this:

$ cat data 
abcd
pattern1
xyz
pattern2
abcde
$ awk '/pattern1/,/pattern2/' data
pattern1
xyz
pattern2

And sed has it too:

$ sed -n '/pattern1/,/pattern2/p' data
pattern1
xyz
pattern2

Edit: for that you will have to use some sort of anchors, either word boundary \\y in gawk or start and end anchors like this:

$ cat data 
abcd
pattern1 234
pattern1
xyz
pattern2
abcde
$ awk '/^pattern1$/,/pattern2/' data 
pattern1
xyz
pattern2

And if you want combinations of printing or not printing the pattern1 / pattern2 lines you can use these:

$ awk '/^pattern1$/{flag=1} /pattern2/{flag=0}flag' data 
pattern1
xyz
$ awk '/^pattern1$/{flag=1;next} /pattern2/{flag=0}flag' data 
xyz
$ awk '/^pattern1$/{flag=1;next;} /pattern2/{flag=0;print}flag' data 
xyz
pattern2

Here's another answer in line with the suggestion in the question:

awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{flag=0;next} {if (flag == 1) {print}}'

The first pattern must match the full line exactly (using ^ and $), while the second pattern can appear anywhere within the line.

EDIT: This version does print the lines on which pattern1 appears. If you want to not print them, replace "flag=1;print;next" by "flag=1;next".

awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{if (flag == 1) {print}; flag=0;} {if (flag == 1) {print}}' filename

This way you can avoid printing double "pattern2":

me:~$ awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{if (flag == 1) {print}; flag=0;} {if (flag == 1) {print}}' a
pattern1
xyz
as pattern2 sd

me:~$ cat a
abcd
pattern1 23
pattern1
xyz
as pattern2 sd
abcde
pattern2

Without sample input/output it's a guess but this MAY be what you want:

awk '/pattern2/{flag=0} flag; $0=="pattern1"{flag=1}' filename

which could be written more meaningfully as:

awk '/end_regexp/{found=0} found; $0=="start_string"{found=1}' filename

(Nbd but naming a flag flag is as useful as naming a function function !)

I actually think this might be what you REALLY should be using but idk:

awk 'index($0,"end_string"){found=0} found; $0=="start_string"{found=1}' filename

See also https://stackoverflow.com/a/18409469/1745001 for more ways to find text using awk.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM