简体   繁体   中英

sed or awk script to substitute the structure of a text file

I want to create a sed or awk script which on awk -f script.awk oldfile > newfile turns a given text file oldfile with contents

Some Heading
example text

Another Heading
1. example list item, but it
spans over multiple lines
2. list item

into a new text file newfile with contents:

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

Further description to eliminate possible ambiguities :

  • The script should substitute each block (ie of lines, encapsulated by blank lines) accordingly.
  • In a text file, multiple such blocks may occur and it is not clear in which order they occur.
  • The script should do the substitutions conditionally depending on whether a heading (ie the first line of a block) is followed by a list of items (indicated by lines beginning with '1.') or not.
  • Blocks are always separated by blank lines.

How can I accomplish this with sed or awk? (I use zsh in case this makes a difference.)


Addition: I just found out that I really need to know beforehand whether the block is a list or not:

heading
1. foo
2. bar

to

{list: heading}{
 [item] foo
 [item] bar
}

So I need to put in the “list:” if it's a list. Can this also be done?

With awk you can do something like this:

awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" $0 ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n  [item] " $0; next } { block = block (list ? " " : "") $0 } END { print block (list ? "\n}" : "}") }' filename

Where the code is:

#!/usr/bin/awk -f

/^$/ {                               # empty line: print converted block
  print block (list ? "\n}" : "}")   # Whether there's a newline before the
  block = ""                         # closing } depends on whether this is
  next                               # a list. Reset block buffer.
}
block == "" {                        # in the first line of a block:
  block = "{" $0 ":} {"              # format header
  list = 0                           # reset list flag
  next
}
/^[0-9]+\. / {                       # if a data line opens a list
  list = 1                           # set list flag
  sub(/^[0-9]+\. /, "")              # remove number
  block = block "\n  [item] " $0     # format line
  next
}
{                                    # if it doesn't, just append it. Space
  block = block (list ? " " : "") $0 # inside a list to not fuse words.
}
END {                                # and at the very end, print the last
  print block (list ? "\n}" : "}")   # block
}

It is also possible with sed, but rather more difficult to read:

#!/bin/sed -nf

/^$/ {                       # empty line: print converted block
  x                          # fetch it from the hold buffer
  s/$/}/                     # append closing }
  /\n  \[item\]/ s/}$/\n}/   # in a list, put in a newline before it
  p                          # print
  d                          # and we're done here. Hold buffer is now empty.
}
x                            # otherwise: inspect the hold buffer
// {                         # if it is empty (reusing last regex)
  x                          # get back the pattern space
  s/.*/{&:}{/                # Format header
  h                          # hold it.
  d                          # we're done here.
}
x                            # otherwise, get back the pattern space
/^[0-9]\+\. / {              # if the line opens a list
  s///                       # remove the number (reusing regex)
  s/.*/  [item] &/           # format the line
  H                          # append it to the hold buffer.
  ${                         # if it is the last line
    s/.*/}/                  # append a closing bracket
    H                        # to the hold buffer
    x                        # swap it with the hold buffer
    p                        # and print that.
  }
  d                          # we're done.
}
                             # otherwise (not opening a list item)
H                            # append line to the hold buffer
x                            # fetch back the hold buffer to work on it

/\n  \[item\]/ {             # if we're in a list
  s/\(.*\)\n/\1 /            # replace the last newline (that we just put there)
                             # with a space
  ${
    s/$/\n}/                 # if this is the last line, append \n}
    p                        # and print
  }
  x                          # put the half-assembled block in the hold buffer
  d                          # and we're done
}
s/\(.*\)\n/\1/               # otherwise (not in a list): just remove the newline
${
  s/$/}/                     # if this is the last line, append closing bracket
  p                          # print
}
x                            # put half-assembled block in the hold buffer.

sed is line-oriented and as such is best for simple substitution on a single line.

Just use awk in paragraph mode ( RS="" ) so every block of blank-line-separated text is treated as a record and treat every line in each paragraph as a field of the record ( FS="\\n" ):

$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
    printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {", $1
    inList = 0
    for (i=2; i<=NF; i++) {
        if ( sub(/^[0-9]+\./,"  [item]",$i) ) {
            printf "\n"
            inList = 1
        }
        else if (inList) {
            printf " "
        }
        printf "%s", $i
    }
    print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}

{list: Another Heading} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

Another awk version(similar to Eds)

BEGIN{RS="";FS="\n"}
{
    {printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")$1":} {"
    for(i=2;i<=NF;i++)
    printf "%s",sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i
    print x?"\n}":"}""\n"
    x=0
}

Output

$awk -f test.awk file

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but itspans over multiple lines
  [item] list item
}

How it works

BEGIN{RS="";FS="\n"}

Read Records as blocks separated by a blank line.
Read fields as lines.

{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")$1":} {"

Print the first field(line) in the format specified, notice printf was used to omit the newline. Checks if any part of the record contains a newline and then a number and period and adds list if it does.

for(i=2;i<=NF;i++)

Loop from the second field to the last field. NF is the number of fields.

I'll split the next bit up.

printf "%s"

Print a string, printf is used again to control newlines

sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i

This is effectively an if else statement using the ternary operator a?b:c . sub will return 0 if it cannot be completed and x will not be incremented so the line will be printed as is.
If the sub is successful,it replaces the number at the start with [item] for that line,increments x and prints the new line with a newline before it.

print x?"\n}":"}""\n"

Uses the ternary operator again to check if x was incremented.If it was prints a newline before the } else just rpints the } .Prints a newline for the double newline between records.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM