I'm trying to extract from a large file lines located between two lines each of which is marked by a certain pattern, let's say pattern1 and pattern2. My code :
awk "/pattern1/{flag=1;next}/pattern2/{flag=0}flag" filename
verifies if "pattern1" exists in a line and start printing from that line until it finds a subsequent line in which the string "pattern2" exists.
What I would like to do is exactly matching the string "pattern1" with the line from which awk will begin printing, and detecting the line at which awk will stop printing by verifying if "pattern2" exists in the line (no exact matching). So basically, I would like to do exact matching for the first pattern and keep the matching behavior of the command above for the second pattern.
awk
has that functionality builtin like this:
$ cat data
abcd
pattern1
xyz
pattern2
abcde
$ awk '/pattern1/,/pattern2/' data
pattern1
xyz
pattern2
And sed
has it too:
$ sed -n '/pattern1/,/pattern2/p' data
pattern1
xyz
pattern2
Edit: for that you will have to use some sort of anchors, either word boundary \\y
in gawk
or start and end anchors like this:
$ cat data
abcd
pattern1 234
pattern1
xyz
pattern2
abcde
$ awk '/^pattern1$/,/pattern2/' data
pattern1
xyz
pattern2
And if you want combinations of printing or not printing the pattern1
/ pattern2
lines you can use these:
$ awk '/^pattern1$/{flag=1} /pattern2/{flag=0}flag' data
pattern1
xyz
$ awk '/^pattern1$/{flag=1;next} /pattern2/{flag=0}flag' data
xyz
$ awk '/^pattern1$/{flag=1;next;} /pattern2/{flag=0;print}flag' data
xyz
pattern2
Here's another answer in line with the suggestion in the question:
awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{flag=0;next} {if (flag == 1) {print}}'
The first pattern must match the full line exactly (using ^ and $), while the second pattern can appear anywhere within the line.
EDIT: This version does print the lines on which pattern1 appears. If you want to not print them, replace "flag=1;print;next" by "flag=1;next".
awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{if (flag == 1) {print}; flag=0;} {if (flag == 1) {print}}' filename
This way you can avoid printing double "pattern2":
me:~$ awk 'BEGIN{flag=0} /^pattern1$/{flag=1;print;next} /pattern2/{if (flag == 1) {print}; flag=0;} {if (flag == 1) {print}}' a
pattern1
xyz
as pattern2 sd
me:~$ cat a
abcd
pattern1 23
pattern1
xyz
as pattern2 sd
abcde
pattern2
Without sample input/output it's a guess but this MAY be what you want:
awk '/pattern2/{flag=0} flag; $0=="pattern1"{flag=1}' filename
which could be written more meaningfully as:
awk '/end_regexp/{found=0} found; $0=="start_string"{found=1}' filename
(Nbd but naming a flag flag
is as useful as naming a function function
!)
I actually think this might be what you REALLY should be using but idk:
awk 'index($0,"end_string"){found=0} found; $0=="start_string"{found=1}' filename
See also https://stackoverflow.com/a/18409469/1745001 for more ways to find text using awk.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.