简体   繁体   中英

How can i reoder lines in a text file based on a pattern?

I have a text file that contains batches of 4 lines, the first line of each batch is in the correct position however the next 3 lines are not always in the correct order.

name cat
label 4
total 5
value 4

name dog
total 4
label 3
value 6

name cow
value 6
total 1
label 4

name fish
total 3
label 5
value 6

I would like each 4 line batch to be in the following format:

name cat
value 4
total 5
label 4

so the output would be:

name cat
value 4
total 5
label 4

name dog
value 6
total 4
label 3

name cow
value 6
total 1
label 4

name fish
value 6
total 3
label 5

The file contains thousands of lines in total, so i would like to build a command that can deal with all potential orders of the 3 lines and re-arrange them if not in the correct format.

I am aware i can use awk to search lines that begin with a particular string and them re-arrange them:

awk '$1 == "value" { print $3, $4, $1, $2; next; } 1' 

However i can not figure out how to acheive something similiar that processes over multiple lines.

How can i acheive this?

By setting RS to the empty string, each block of text separated by at least one empty line, is considered a single record. From there it's easy to capture each key-value pair and output them in the desired order.

BEGIN {RS=""}
{
    for (i=1; i<=NF; i+=2) a[$i] = $(i+1)
    print "name", a["name"] ORS \
          "value", a["value"] ORS \
          "total", a["total"] ORS \
          "label", a["label"] ORS
}


$ awk -f a.awk file
name cat
value 4
total 5
label 4

name dog
value 6
total 4
label 3

name cow
value 6
total 1
label 4

name fish
value 6
total 3
label 5

Could you please try following.

awk '
/^name/{
  if(name){
    print name ORS array["value"] ORS array["total"] ORS array["label"] ORS
    delete array
  }
  name=$0
  next
}
{
  array[$1]=$0
}
END{
  print name ORS array["value"] ORS array["total"] ORS array["label"]
}
'  Input_file


EDIT: Adding refined solution of above suggested by Kvantour sir.

awk -v OFS="\n" '
(!NF) && ("name" in a){
  print a["name"],a["value"],a["total"],a["label"] ORS
  delete a
  next
}
{
  a[$1]=$0
}
END{
  print a["name"],a["value"],a["total"],a["label"]
}
'  Input_file

The simplest way is the following:

awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"}
     { for(i=1;i<=NF;++i) { k=substr($i,1,index($i," ")-1); a[k]=$i } }
     { print a["name"],a["value"],a["total"],a["label"] }' file

How does this work?

Awk knows the concept records and fields . Files are split in records where consecutive records are split by the record separator RS . Each record is split in fields, where consecutive fields are split by the field separator FS . By default, the record separator RS is set to be the <newline> character ( \\n ) and thus each record is a line. The record separator has the following definition:

RS : The first character of the string value of RS shall be the input record separator; a <newline> by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input , and a <newline> shall always be a field separator, no matter what the value of FS is.

So with the file format you give, we can define the records based on RS="" and the field separator `FS="\\n".

Each record looks simplified as:

key1 string1      << field $1
key2 string2      << field $2
key3 string3      << field $3
key4 string4      << field $4
...
keyNF stringNF    << field $NF

When awk reads a record, we first parse it by storing all key-value pairs in an array a . Afterwards, we ask to print the values we find interesting. For this, we need to define the output-field-separators OFS and output-record-separator ORS .

In Vim you could sort the file in sections using reverse order sort! :

for i in range(1,line("$"))
  /^name/+1,/^name/+3sort!
endfor

Same command issued from the shell:

$ ex -s '+for i in range(1,line("$"))|/^name/+1,/^name/+3sort!|endfor' '+%p' '+q!' inputfile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM