简体   繁体   English

sed或awk脚本来替代文本文件的结构

[英]sed or awk script to substitute the structure of a text file

I want to create a sed or awk script which on awk -f script.awk oldfile > newfile turns a given text file oldfile with contents 我想创建一个sed或awk脚本,在awk -f script.awk oldfile > newfile将给定文本文件oldfile包含内容

Some Heading
example text

Another Heading
1. example list item, but it
spans over multiple lines
2. list item

into a new text file newfile with contents: 到一个新的文本文件newfile ,其内容如下:

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

Further description to eliminate possible ambiguities : 进一步说明以消除可能的歧义

  • The script should substitute each block (ie of lines, encapsulated by blank lines) accordingly. 脚本应相应地替换每个块(即,由空行封装的行)。
  • In a text file, multiple such blocks may occur and it is not clear in which order they occur. 在文本文件中,可能会出现多个这样的块,并且不清楚它们的出现顺序。
  • The script should do the substitutions conditionally depending on whether a heading (ie the first line of a block) is followed by a list of items (indicated by lines beginning with '1.') or not. 脚本应根据标题(即,块的第一行)后面是否跟有项目列表(以“ 1”开头的行表示)来有条件地进行替换。
  • Blocks are always separated by blank lines. 块始终由空白行分隔。

How can I accomplish this with sed or awk? 如何使用sed或awk完成此操作? (I use zsh in case this makes a difference.) (我使用zsh以防万一。)


Addition: I just found out that I really need to know beforehand whether the block is a list or not: 另外:我刚刚发现我真的需要事先知道该块是否是列表:

heading
1. foo
2. bar

to

{list: heading}{
 [item] foo
 [item] bar
}

So I need to put in the “list:” if it's a list. 因此,如果是列表,我需要输入“ list:”。 Can this also be done? 也可以做到吗?

With awk you can do something like this: 使用awk,您可以执行以下操作:

awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" $0 ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n  [item] " $0; next } { block = block (list ? " " : "") $0 } END { print block (list ? "\n}" : "}") }' filename

Where the code is: 代码在哪里:

#!/usr/bin/awk -f

/^$/ {                               # empty line: print converted block
  print block (list ? "\n}" : "}")   # Whether there's a newline before the
  block = ""                         # closing } depends on whether this is
  next                               # a list. Reset block buffer.
}
block == "" {                        # in the first line of a block:
  block = "{" $0 ":} {"              # format header
  list = 0                           # reset list flag
  next
}
/^[0-9]+\. / {                       # if a data line opens a list
  list = 1                           # set list flag
  sub(/^[0-9]+\. /, "")              # remove number
  block = block "\n  [item] " $0     # format line
  next
}
{                                    # if it doesn't, just append it. Space
  block = block (list ? " " : "") $0 # inside a list to not fuse words.
}
END {                                # and at the very end, print the last
  print block (list ? "\n}" : "}")   # block
}

It is also possible with sed, but rather more difficult to read: sed也有可能,但更难阅读:

#!/bin/sed -nf

/^$/ {                       # empty line: print converted block
  x                          # fetch it from the hold buffer
  s/$/}/                     # append closing }
  /\n  \[item\]/ s/}$/\n}/   # in a list, put in a newline before it
  p                          # print
  d                          # and we're done here. Hold buffer is now empty.
}
x                            # otherwise: inspect the hold buffer
// {                         # if it is empty (reusing last regex)
  x                          # get back the pattern space
  s/.*/{&:}{/                # Format header
  h                          # hold it.
  d                          # we're done here.
}
x                            # otherwise, get back the pattern space
/^[0-9]\+\. / {              # if the line opens a list
  s///                       # remove the number (reusing regex)
  s/.*/  [item] &/           # format the line
  H                          # append it to the hold buffer.
  ${                         # if it is the last line
    s/.*/}/                  # append a closing bracket
    H                        # to the hold buffer
    x                        # swap it with the hold buffer
    p                        # and print that.
  }
  d                          # we're done.
}
                             # otherwise (not opening a list item)
H                            # append line to the hold buffer
x                            # fetch back the hold buffer to work on it

/\n  \[item\]/ {             # if we're in a list
  s/\(.*\)\n/\1 /            # replace the last newline (that we just put there)
                             # with a space
  ${
    s/$/\n}/                 # if this is the last line, append \n}
    p                        # and print
  }
  x                          # put the half-assembled block in the hold buffer
  d                          # and we're done
}
s/\(.*\)\n/\1/               # otherwise (not in a list): just remove the newline
${
  s/$/}/                     # if this is the last line, append closing bracket
  p                          # print
}
x                            # put half-assembled block in the hold buffer.

sed is line-oriented and as such is best for simple substitution on a single line. sed是面向行的,因此最适合单行替换。

Just use awk in paragraph mode ( RS="" ) so every block of blank-line-separated text is treated as a record and treat every line in each paragraph as a field of the record ( FS="\\n" ): 只需在段落模式下使用awk( RS="" ),以便将每行用空白行分隔的文本都视为一条记录,并将每个段落中的每一行都视为记录的字段( FS="\\n" ):

$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
    printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {", $1
    inList = 0
    for (i=2; i<=NF; i++) {
        if ( sub(/^[0-9]+\./,"  [item]",$i) ) {
            printf "\n"
            inList = 1
        }
        else if (inList) {
            printf " "
        }
        printf "%s", $i
    }
    print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}

{list: Another Heading} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

Another awk version(similar to Eds) 另一个awk版本(类似于Eds)

BEGIN{RS="";FS="\n"}
{
    {printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")$1":} {"
    for(i=2;i<=NF;i++)
    printf "%s",sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i
    print x?"\n}":"}""\n"
    x=0
}

Output 输出量

$awk -f test.awk file

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but itspans over multiple lines
  [item] list item
}

How it works 这个怎么运作

BEGIN{RS="";FS="\n"}

Read Records as blocks separated by a blank line. 将记录读取为用空白行分隔的块。
Read fields as lines. 将字段读取为行。

{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")$1":} {"

Print the first field(line) in the format specified, notice printf was used to omit the newline. 以指定的格式打印第一个字段(行),注意printf用于省略换行符。 Checks if any part of the record contains a newline and then a number and period and adds list if it does. 检查记录的任何部分是否包含换行符,然后包含数字和句点,如果包含则添加列表。

for(i=2;i<=NF;i++)

Loop from the second field to the last field. 从第二个字段循环到最后一个字段。 NF is the number of fields. NF是字段数。

I'll split the next bit up. 我将下一个分割。

printf "%s"

Print a string, printf is used again to control newlines 打印字符串,再次使用printf控制换行符

sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i

This is effectively an if else statement using the ternary operator a?b:c . 这实际上是使用三元运算符a?b:c的if else语句。 sub will return 0 if it cannot be completed and x will not be incremented so the line will be printed as is. 如果无法完成,sub将返回0,并且x不会递增,因此该行将按原样打印。
If the sub is successful,it replaces the number at the start with [item] for that line,increments x and prints the new line with a newline before it. 如果该子程序成功,它将用该行的[item]替换开头的数字,并递增x并在新行之前打印换行符。

print x?"\n}":"}""\n"

Uses the ternary operator again to check if x was incremented.If it was prints a newline before the } else just rpints the } .Prints a newline for the double newline between records. 再次使用三元运算符检查x是否增加。如果x在}之前打印换行,则仅rpint } 。为记录之间的双换行打印换行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM