如何根据模式重新编码文本文件中的行？

Question

I have a text file that contains batches of 4 lines, the first line of each batch is in the correct position however the next 3 lines are not always in the correct order.我有一个包含 4 行批次的文本文件，每个批次的第一行位于正确的位置，但接下来的 3 行并不总是按正确的顺序排列。

name cat
label 4
total 5
value 4

name dog
total 4
label 3
value 6

name cow
value 6
total 1
label 4

name fish
total 3
label 5
value 6

I would like each 4 line batch to be in the following format:我希望每个 4 行批次采用以下格式：

name cat
value 4
total 5
label 4

so the output would be:所以输出将是：

name cat
value 4
total 5
label 4

name dog
value 6
total 4
label 3

name cow
value 6
total 1
label 4

name fish
value 6
total 3
label 5

The file contains thousands of lines in total, so i would like to build a command that can deal with all potential orders of the 3 lines and re-arrange them if not in the correct format.该文件总共包含数千行，所以我想构建一个命令来处理这 3 行的所有潜在顺序，如果格式不正确，则重新排列它们。

I am aware i can use awk to search lines that begin with a particular string and them re-arrange them:我知道我可以使用 awk 搜索以特定字符串开头的行，然后重新排列它们：

awk '$1 == "value" { print $3, $4, $1, $2; next; } 1'

However i can not figure out how to acheive something similiar that processes over multiple lines.但是我不知道如何实现类似的处理多行的东西。

How can i acheive this?我怎样才能做到这一点？

Answer 1

By setting RS to the empty string, each block of text separated by at least one empty line, is considered a single record.通过将RS设置为空字符串，由至少一个空行分隔的每个文本块被视为单个记录。 From there it's easy to capture each key-value pair and output them in the desired order.从那里可以轻松捕获每个键值对并按所需顺序输出它们。

BEGIN {RS=""}
{
    for (i=1; i<=NF; i+=2) a[$i] = $(i+1)
    print "name", a["name"] ORS \
          "value", a["value"] ORS \
          "total", a["total"] ORS \
          "label", a["label"] ORS
}


$ awk -f a.awk file
name cat
value 4
total 5
label 4

name dog
value 6
total 4
label 3

name cow
value 6
total 1
label 4

name fish
value 6
total 3
label 5

Answer 2

Could you please try following.你能不能试试以下。

awk '
/^name/{
  if(name){
    print name ORS array["value"] ORS array["total"] ORS array["label"] ORS
    delete array
  }
  name=$0
  next
}
{
  array[$1]=$0
}
END{
  print name ORS array["value"] ORS array["total"] ORS array["label"]
}
'  Input_file

EDIT: Adding refined solution of above suggested by Kvantour sir.编辑：添加 Kvantour 先生建议的上述精炼解决方案。

awk -v OFS="\n" '
(!NF) && ("name" in a){
  print a["name"],a["value"],a["total"],a["label"] ORS
  delete a
  next
}
{
  a[$1]=$0
}
END{
  print a["name"],a["value"],a["total"],a["label"]
}
'  Input_file

Answer 3

The simplest way is the following:最简单的方法如下：

awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"}
     { for(i=1;i<=NF;++i) { k=substr($i,1,index($i," ")-1); a[k]=$i } }
     { print a["name"],a["value"],a["total"],a["label"] }' file

How does this work?这是如何运作的？

Awk knows the concept records and fields . awk 知道记录和字段的概念。 Files are split in records where consecutive records are split by the record separator RS .文件被分割成记录，其中连续的记录被记录分隔符RS分割。 Each record is split in fields, where consecutive fields are split by the field separator FS .每条记录都拆分为字段，其中连续的字段由字段分隔符FS拆分。 By default, the record separator RS is set to be the <newline> character ( \\n ) and thus each record is a line.默认情况下，记录分隔符RS设置为 <newline> 字符 ( \\n )，因此每条记录都是一行。 The record separator has the following definition:记录分隔符的定义如下：

RS : The first character of the string value of RS shall be the input record separator; RS ：的字符串值的第一个字符RS应输入记录分隔符; a <newline> by default.默认为 <newline>。 If RS contains more than one character, the results are unspecified.如果RS包含多个字符，则结果未指定。 If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input , and a <newline> shall always be a field separator, no matter what the value of FS is.如果RS为空，则记录由由 <newline> 加上一个或多个空行组成的序列分隔，前导或尾随空行不应在输入的开头或结尾导致空记录，并且 <newline> 应无论FS的值是什么，始终是字段分隔符。

So with the file format you give, we can define the records based on RS="" and the field separator `FS="\\n".因此，使用您提供的文件格式，我们可以根据RS=""和字段分隔符 `FS="\\n" 定义记录。

Each record looks simplified as:每条记录看起来都简化为：

key1 string1      << field $1
key2 string2      << field $2
key3 string3      << field $3
key4 string4      << field $4
...
keyNF stringNF    << field $NF

When awk reads a record, we first parse it by storing all key-value pairs in an array a .当 awk 读取一条记录时，我们首先通过将所有key-value对存储在数组a解析它。 Afterwards, we ask to print the values we find interesting.之后，我们要求打印我们觉得有趣的值。 For this, we need to define the output-field-separators OFS and output-record-separator ORS .为此，我们需要定义输出字段分隔符OFS和输出记录分隔符ORS 。

Answer 4

In Vim you could sort the file in sections using reverse order sort!在 Vim 中，您可以使用逆序排序对文件进行分段sort! : ：

for i in range(1,line("$"))
  /^name/+1,/^name/+3sort!
endfor

Same command issued from the shell:从 shell 发出的相同命令：

$ ex -s '+for i in range(1,line("$"))|/^name/+1,/^name/+3sort!|endfor' '+%p' '+q!' inputfile

如何根据模式重新编码文本文件中的行？

问题描述

4 个解决方案

解决方案1
4 2020-01-17 16:49:04

解决方案2
3 已采纳 2020-01-17 16:32:57

解决方案3
1 2020-01-17 16:52:28

解决方案4
1 2020-01-17 18:04:45

如何根据模式重新编码文本文件中的行？

问题描述

4 个解决方案

解决方案1 4 2020-01-17 16:49:04

解决方案2 3 已采纳 2020-01-17 16:32:57

解决方案3 1 2020-01-17 16:52:28

解决方案4 1 2020-01-17 18:04:45

解决方案1
4 2020-01-17 16:49:04

解决方案2
3 已采纳 2020-01-17 16:32:57

解决方案3
1 2020-01-17 16:52:28

解决方案4
1 2020-01-17 18:04:45