I want to create a sed or awk script which on awk -f script.awk oldfile > newfile
turns a given text file oldfile
with contents
Some Heading
example text
Another Heading
1. example list item, but it
spans over multiple lines
2. list item
into a new text file newfile
with contents:
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but it spans over multiple lines
[item] list item
}
Further description to eliminate possible ambiguities :
How can I accomplish this with sed or awk? (I use zsh in case this makes a difference.)
Addition: I just found out that I really need to know beforehand whether the block is a list or not:
heading
1. foo
2. bar
to
{list: heading}{
[item] foo
[item] bar
}
So I need to put in the “list:” if it's a list. Can this also be done?
With awk you can do something like this:
awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" $0 ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n [item] " $0; next } { block = block (list ? " " : "") $0 } END { print block (list ? "\n}" : "}") }' filename
Where the code is:
#!/usr/bin/awk -f
/^$/ { # empty line: print converted block
print block (list ? "\n}" : "}") # Whether there's a newline before the
block = "" # closing } depends on whether this is
next # a list. Reset block buffer.
}
block == "" { # in the first line of a block:
block = "{" $0 ":} {" # format header
list = 0 # reset list flag
next
}
/^[0-9]+\. / { # if a data line opens a list
list = 1 # set list flag
sub(/^[0-9]+\. /, "") # remove number
block = block "\n [item] " $0 # format line
next
}
{ # if it doesn't, just append it. Space
block = block (list ? " " : "") $0 # inside a list to not fuse words.
}
END { # and at the very end, print the last
print block (list ? "\n}" : "}") # block
}
It is also possible with sed, but rather more difficult to read:
#!/bin/sed -nf
/^$/ { # empty line: print converted block
x # fetch it from the hold buffer
s/$/}/ # append closing }
/\n \[item\]/ s/}$/\n}/ # in a list, put in a newline before it
p # print
d # and we're done here. Hold buffer is now empty.
}
x # otherwise: inspect the hold buffer
// { # if it is empty (reusing last regex)
x # get back the pattern space
s/.*/{&:}{/ # Format header
h # hold it.
d # we're done here.
}
x # otherwise, get back the pattern space
/^[0-9]\+\. / { # if the line opens a list
s/// # remove the number (reusing regex)
s/.*/ [item] &/ # format the line
H # append it to the hold buffer.
${ # if it is the last line
s/.*/}/ # append a closing bracket
H # to the hold buffer
x # swap it with the hold buffer
p # and print that.
}
d # we're done.
}
# otherwise (not opening a list item)
H # append line to the hold buffer
x # fetch back the hold buffer to work on it
/\n \[item\]/ { # if we're in a list
s/\(.*\)\n/\1 / # replace the last newline (that we just put there)
# with a space
${
s/$/\n}/ # if this is the last line, append \n}
p # and print
}
x # put the half-assembled block in the hold buffer
d # and we're done
}
s/\(.*\)\n/\1/ # otherwise (not in a list): just remove the newline
${
s/$/}/ # if this is the last line, append closing bracket
p # print
}
x # put half-assembled block in the hold buffer.
sed is line-oriented and as such is best for simple substitution on a single line.
Just use awk in paragraph mode ( RS=""
) so every block of blank-line-separated text is treated as a record and treat every line in each paragraph as a field of the record ( FS="\\n"
):
$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {", $1
inList = 0
for (i=2; i<=NF; i++) {
if ( sub(/^[0-9]+\./," [item]",$i) ) {
printf "\n"
inList = 1
}
else if (inList) {
printf " "
}
printf "%s", $i
}
print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}
{list: Another Heading} {
[item] example list item, but it spans over multiple lines
[item] list item
}
Another awk version(similar to Eds)
BEGIN{RS="";FS="\n"}
{
{printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")$1":} {"
for(i=2;i<=NF;i++)
printf "%s",sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
print x?"\n}":"}""\n"
x=0
}
$awk -f test.awk file
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but itspans over multiple lines
[item] list item
}
BEGIN{RS="";FS="\n"}
Read Records as blocks separated by a blank line.
Read fields as lines.
{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")$1":} {"
Print the first field(line) in the format specified, notice printf was used to omit the newline. Checks if any part of the record contains a newline and then a number and period and adds list if it does.
for(i=2;i<=NF;i++)
Loop from the second field to the last field. NF
is the number of fields.
I'll split the next bit up.
printf "%s"
Print a string, printf is used again to control newlines
sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
This is effectively an if else statement using the ternary operator a?b:c
. sub will return 0 if it cannot be completed and x will not be incremented so the line will be printed as is.
If the sub is successful,it replaces the number at the start with [item]
for that line,increments x and prints the new line with a newline before it.
print x?"\n}":"}""\n"
Uses the ternary operator again to check if x was incremented.If it was prints a newline before the }
else just rpints the }
.Prints a newline for the double newline between records.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.