简体   繁体   中英

How to sort a file based on key name instead of its position in unix?

I want to sort a file in Unix and for that I am using command

sort file --field-separator=' ' --key=7,7

But position of this field is not fixed, sometimes it can be 7th field or sometimes 6th or 8th field in the line.

Do we know if its possible to sort the file based on field name, something like

sort file --field-separator=' ' --keyname=<my_unique_id>

File looks something like this, I want to sort on the basis of party_id

status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"

sort doesn't have a concept of named keys, but you can perform a Schwartzian transform to temporarily add the key as a prefix to the line, sort on the first field, then discard it.

sed 's/\(.*\)\(party_id="[^"]*"\)/\2    \1\2/' file |
sort -t '   ' -k1,1 |
cut -f2-

(where the whitespace between the two first back references and in the sort -t argument is a literal tab, which however Stack Overflow renders as a sequence of spaces).

Using the Decorate/Sort/Undecorate idiom and assuming that, like in the example you provided, your quoted strings don't contain blanks, = , or " :

Decorate:

$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file
36113477        status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"
36053415        status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"

Decorate then Sort:

$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file |
    sort -k1,1n
36053415        status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
36113477        status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"

Decorate then Sort then Undecorate:

$ awk -F'[ ="]+' -v OFS='\t' -v keyname='party_id' '{for (i=1; i<NF; i+=2) if ($i == keyname) { print $(i+1), $0; next} }' file |
    sort -k1,1n |
    cut -f2-
status_date="2002-12-31" ref_date="2021-03-31" ead_percent="1" accounting_standard="IFRS" orig_src_system_id="GRD" party_default_status_cd="UNLIKE" party_id="36053415" v_src_system_id="XYZ"
status_date="2000-01-31" ref_date="2021-03-31" ead_percent="0.00365316" accounting_standard="IFRS" party_default_status_cd="NOTDFLT" party_id="36113477" v_src_system_id="ABC"

With GNU awk ( gawk ) one may specify how arrays are traversed. The following saves each line in an array using party_id=XYZ as respective index and then returns the array sorted by said indices. Limited by RAM for very large files.

awk '{match($0,/party_id=[^ ]*/,$0,id) ; arr[id[0]]=$0}
     END {PROCINFO["sorted_in"]="@ind_str_asc"
          for (i in arr) {print arr[i]}
     }' infile.txt 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM