Here is my sample code:
BEGIN
one
one
one one
one
END
filler filler filler filler
BEGIN
two two
two
two two
END
filler filler filler filler
BEGIN
three three three
three three
three
END
I want to extract the lines between (and including) BEGIN
and END
. I have a sed that does this already:
sed '/BEGIN/,/END/!d' file
But I'd like to extract the pattern space progressively. Which is to say, what can I do to the sed
command above to get to output only the first block? And then the second Block? And the third? etc...
(As some of you may guess, my end goal is to parse through a file with x509 certificates and extract data on each certificate in the file, rather than just the first certificate in a file which openssl does by default. If there is an easier alternative than the above, I'm all ears).
I'm not sure you can easily do that in sed
, but you can in awk
:
awk '/^BEGIN$/ { file = sprintf("file%d.out", ++i); }
/^BEGIN$/,/^END$/ { print > file }' data
This generates file1.out
for the first block, file2.out
for the second, etc.
Could you explain the working parts to your awk?
The first rule line matches lines that contain BEGIN
and generates a file name in the variable file
using a counter in variable i
(pre-incremented, so the first file is file1.out
).
The second rule line matches ranges of lines from BEGIN
to END
and uses print
(aka print $0
) redirected to the current file specified by the variable file
. Thus it writes to the relevant file each time.
Also, how would you change it to instead output the contents to stdout? I was hoping for a way to specify a "Nth" pattern argument, which I was going to supply from a simple for loop that was running for as many times as the pattern "BEGIN" was found to get a total count.
You can do that by using one line to count the blocks and skip all except the relevant one, and then simply print the data for the block that is relevant.
awk -v N=$N '/^BEGIN$/ { if (++i != N) next; }
/^BEGIN$/,/^END$/ { print }' data
The -v N=$N
relays the shell variable $N
to awk
; the first line counts (using i
the sections, skipping all except the N th . The second line only gets triggered when the first line does not skip it, so it prints the contents of the N th block. Some awk
afficionados (who are probably APL programmers in their spare time) would omit the { print }
block, but I think it makes the code clearer to anyone else who has to maintain the code.
it is possible to use opposite way. Do not print default and print only lines between patterns
sed -n '/BEGIN/,/END/p' <file
Using awk, export the second record only, and no need go through the whole file. You will get the result in file "file.out". You can define the number (n=2) by yourself.
n=2
awk -v N=$n '/^BEGIN$/{++i}
/^BEGIN$/,/^END$/ { if (i==N) {print > "file.out";quit}}' file
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.