[英]sed or awk script to substitute the structure of a text file
I want to create a sed or awk script which on awk -f script.awk oldfile > newfile
turns a given text file oldfile
with contents 我想创建一个sed或awk脚本,在
awk -f script.awk oldfile > newfile
将给定文本文件oldfile
包含内容
Some Heading
example text
Another Heading
1. example list item, but it
spans over multiple lines
2. list item
into a new text file newfile
with contents: 到一个新的文本文件
newfile
,其内容如下:
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but it spans over multiple lines
[item] list item
}
Further description to eliminate possible ambiguities : 进一步说明以消除可能的歧义 :
How can I accomplish this with sed or awk? 如何使用sed或awk完成此操作? (I use zsh in case this makes a difference.)
(我使用zsh以防万一。)
Addition: I just found out that I really need to know beforehand whether the block is a list or not: 另外:我刚刚发现我真的需要事先知道该块是否是列表:
heading
1. foo
2. bar
to 至
{list: heading}{
[item] foo
[item] bar
}
So I need to put in the “list:” if it's a list. 因此,如果是列表,我需要输入“ list:”。 Can this also be done?
也可以做到吗?
With awk you can do something like this: 使用awk,您可以执行以下操作:
awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" $0 ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n [item] " $0; next } { block = block (list ? " " : "") $0 } END { print block (list ? "\n}" : "}") }' filename
Where the code is: 代码在哪里:
#!/usr/bin/awk -f
/^$/ { # empty line: print converted block
print block (list ? "\n}" : "}") # Whether there's a newline before the
block = "" # closing } depends on whether this is
next # a list. Reset block buffer.
}
block == "" { # in the first line of a block:
block = "{" $0 ":} {" # format header
list = 0 # reset list flag
next
}
/^[0-9]+\. / { # if a data line opens a list
list = 1 # set list flag
sub(/^[0-9]+\. /, "") # remove number
block = block "\n [item] " $0 # format line
next
}
{ # if it doesn't, just append it. Space
block = block (list ? " " : "") $0 # inside a list to not fuse words.
}
END { # and at the very end, print the last
print block (list ? "\n}" : "}") # block
}
It is also possible with sed, but rather more difficult to read: sed也有可能,但更难阅读:
#!/bin/sed -nf
/^$/ { # empty line: print converted block
x # fetch it from the hold buffer
s/$/}/ # append closing }
/\n \[item\]/ s/}$/\n}/ # in a list, put in a newline before it
p # print
d # and we're done here. Hold buffer is now empty.
}
x # otherwise: inspect the hold buffer
// { # if it is empty (reusing last regex)
x # get back the pattern space
s/.*/{&:}{/ # Format header
h # hold it.
d # we're done here.
}
x # otherwise, get back the pattern space
/^[0-9]\+\. / { # if the line opens a list
s/// # remove the number (reusing regex)
s/.*/ [item] &/ # format the line
H # append it to the hold buffer.
${ # if it is the last line
s/.*/}/ # append a closing bracket
H # to the hold buffer
x # swap it with the hold buffer
p # and print that.
}
d # we're done.
}
# otherwise (not opening a list item)
H # append line to the hold buffer
x # fetch back the hold buffer to work on it
/\n \[item\]/ { # if we're in a list
s/\(.*\)\n/\1 / # replace the last newline (that we just put there)
# with a space
${
s/$/\n}/ # if this is the last line, append \n}
p # and print
}
x # put the half-assembled block in the hold buffer
d # and we're done
}
s/\(.*\)\n/\1/ # otherwise (not in a list): just remove the newline
${
s/$/}/ # if this is the last line, append closing bracket
p # print
}
x # put half-assembled block in the hold buffer.
sed is line-oriented and as such is best for simple substitution on a single line. sed是面向行的,因此最适合单行替换。
Just use awk in paragraph mode ( RS=""
) so every block of blank-line-separated text is treated as a record and treat every line in each paragraph as a field of the record ( FS="\\n"
): 只需在段落模式下使用awk(
RS=""
),以便将每行用空白行分隔的文本都视为一条记录,并将每个段落中的每一行都视为记录的字段( FS="\\n"
):
$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {", $1
inList = 0
for (i=2; i<=NF; i++) {
if ( sub(/^[0-9]+\./," [item]",$i) ) {
printf "\n"
inList = 1
}
else if (inList) {
printf " "
}
printf "%s", $i
}
print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}
{list: Another Heading} {
[item] example list item, but it spans over multiple lines
[item] list item
}
Another awk version(similar to Eds) 另一个awk版本(类似于Eds)
BEGIN{RS="";FS="\n"}
{
{printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")$1":} {"
for(i=2;i<=NF;i++)
printf "%s",sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
print x?"\n}":"}""\n"
x=0
}
$awk -f test.awk file
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but itspans over multiple lines
[item] list item
}
BEGIN{RS="";FS="\n"}
Read Records as blocks separated by a blank line. 将记录读取为用空白行分隔的块。
Read fields as lines. 将字段读取为行。
{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")$1":} {"
Print the first field(line) in the format specified, notice printf was used to omit the newline. 以指定的格式打印第一个字段(行),注意printf用于省略换行符。 Checks if any part of the record contains a newline and then a number and period and adds list if it does.
检查记录的任何部分是否包含换行符,然后包含数字和句点,如果包含则添加列表。
for(i=2;i<=NF;i++)
Loop from the second field to the last field. 从第二个字段循环到最后一个字段。
NF
is the number of fields. NF
是字段数。
I'll split the next bit up. 我将下一个分割。
printf "%s"
Print a string, printf is used again to control newlines 打印字符串,再次使用printf控制换行符
sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
This is effectively an if else statement using the ternary operator a?b:c
. 这实际上是使用三元运算符
a?b:c
的if else语句。 sub will return 0 if it cannot be completed and x will not be incremented so the line will be printed as is. 如果无法完成,sub将返回0,并且x不会递增,因此该行将按原样打印。
If the sub is successful,it replaces the number at the start with [item]
for that line,increments x and prints the new line with a newline before it. 如果该子程序成功,它将用该行的
[item]
替换开头的数字,并递增x并在新行之前打印换行符。
print x?"\n}":"}""\n"
Uses the ternary operator again to check if x was incremented.If it was prints a newline before the }
else just rpints the }
.Prints a newline for the double newline between records. 再次使用三元运算符检查x是否增加。如果x在
}
之前打印换行,则仅rpint }
。为记录之间的双换行打印换行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.