简体   繁体   中英

How to sort on Multiple Fields with different field separator

I want to sort a file on multiple fields and multiple field separator. Please help. Here is my sample data file:

$ cat Data3
My Text|50002/100/43
My Message|50001/100/7
Help Text|50001/100/7
Help Message|50002/100/11
Text Message|50001/100/63
Visible Text|50001/100/52
Invisible Text|50002/100/1

First field separator is a pipe symbol and second field separator is / . I want to sort this data on second field first and then within that the data should be in sorted order of the last field (separate by / ). Finally my sorted data should look like this:

Help Text|50001/100/7
My Message|50001/100/7
Visible Text|50001/100/52
Text Message|50001/100/63
Invisible Text|50002/100/1
Help Message|50002/100/11
My Text|50002/100/43

By using sort -k2,2n -t'|' , I am able to sort on field 2 ( 50001/50002 ), but then within that value how can I sort on the last field (separated by / )?

The simplest trick for this data set is to treat the second column is a version number.

$ cat Data3 | sort -k2,2V -t'|'
Help Text|50001/100/7
My Message|50001/100/7
Visible Text|50001/100/52
Text Message|50001/100/63
Invisible Text|50002/100/1
Help Message|50002/100/11
My Text|50002/100/43

However, that doesn't always work depending on your input. This will work because the values in the second column are the same.

You could do what fedorqui suggested and run sort twice and the second time you do a stable sort. From the manpage: -s, --stable (stabilize sort by disabling last-resort comparison)

First sort on the secondary sort criteria. Then do a stable sort, which is keep the sort order within the rows that share the same key from the primary sort criteria.

$ cat Data3 | sort -k3,3n -t'/' | sort -k2,2n -t'|' -s
Help Text|50001/100/7
My Message|50001/100/7
Visible Text|50001/100/52
Text Message|50001/100/63
Invisible Text|50002/100/1
Help Message|50002/100/11
My Text|50002/100/43

You are a bit lucky in this case since -k2,2n -t'|' will treat the second column "50001/100/7" as a number, which will probably be 50001. You could end up in weird situations if that would be comma-separated instead of slash and you were using a different locale in your environment. For instance, default in my environment I run en_US.UTF-8 which behaves like this.

$ cat Data3 | tr '/' ',' | sort -k3,3n -t',' | LC_NUMERIC=en_US.UTF-8 sort -k2,2n -t'|' -s
Help Text|50001,100,7
My Message|50001,100,7
Invisible Text|50002,100,1
Visible Text|50001,100,52
Text Message|50001,100,63
Help Message|50002,100,11
My Text|50002,100,43

What you would expect is this:

$ cat Data3 | tr '/' ',' | sort -k3,3n -t',' | LC_NUMERIC=C sort -k2,2n -t'|' -s
Help Text|50001,100,7
My Message|50001,100,7
Visible Text|50001,100,52
Text Message|50001,100,63
Invisible Text|50002,100,1
Help Message|50002,100,11
My Text|50002,100,43

The following code works for me as long as there are no additional '|' characters in the text.

tr '|' '/' | sort -n -t '/' -k3 -k4 | sed -re 's/^([^/]*)\\/(.*)$/\\1|\\2/'

a little trick with awk

$ cat Data3  | awk -F'[|/]' '{print $2"\t"$4"\t"$0}' | sort -k1 -k2 -n | cut -f3-
Help Text|50001/100/7
My Message|50001/100/7
Visible Text|50001/100/52
Text Message|50001/100/63
Invisible Text|50002/100/1
Help Message|50002/100/11
My Text|50002/100/43
  • you can use awk with all separators -F'[|/]' specified to print your sorting keys first $2"\\t"$4 and then print input line $0
  • then do one sort with multiple keys -k1 -k2 (note: not the same as -k1,2 )
  • then cut back to the input line

universal for many scenarios

You could use this (inefficient, but simple) script:

#!/usr/bin/perl
print sort  {   @ka = split ?[|/]?, $a;
                @kb = split ?[|/]?, $b;
                $ka[1] <=> $kb[1]
             || $ka[3] <=> $kb[3]
             || $ka[0] cmp $kb[0]
            } <>

You might omit the line || $ka[0] cmp $kb[0] || $ka[0] cmp $kb[0] if you don't care for lines with equal values to be sorted by text message.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM