How do I sort input with a variable number of fields by the second-to-last field?

Question

^{Editor's note: The original title of the question mentioned tabs as the field separators.}

In a text such as

500 east 23rd avenue Toronto 2 890 400000 1 
900 west yellovillage blvd Mississauga 3 800 600090 3

how would you sort in ascending order of the second to last column?

^{Editor's note: The OP later provided another sample input line, 500 Jackson Blvd Toronto 3 700 40000 2 , which contains only 8 whitespace-separated input fields (compared to the 9 above), revealing the need to deal with a variable number of fields in the input.}

Answer 1

Note: There are several, potentially separate questions:

Update : Question C was the relevant one.

Question A: As implied by the question's title only : how can you use the tab character ( \\t ) as the field separator?
Question B: How can you sort input by the second-to-last field, without knowing that field's specific index up front, given a fixed number of fields?
Question C: How can you sort input by the second-to-last field, without knowing that field's respective index up front, given a variable number of fields?

Answer to question A:

sort 's -t option allows you to specify a field separator. By default, sort uses any run of line-interior whitespace as the separator.

Assuming Bash, Ksh, or Zsh, you can use an ANSI C-quoted string ( $'...' ) to specify a single tab as the field separator ( $'\\t' ):

sort -t $'\t' -n -k8,8 file # -n sorts numerically; omit for lexical sorting

Answer to question B:

Note: This assumes that all input lines have the same number of fields, and that input comes from file file :

 # Determine the index of the next-to-last column, based on the first
 # line, using Awk:
 nextToLastColNdx=$(head -n 1 file | awk -F '\t' '{ print NF - 1 }')

 # Sort numerically by the next-to-last column (omit -n to sort lexically):
 sort -t $'\t' -n -k$nextToLastColNdx,$nextToLastColNdx file

Note: To sort by a single field, always specify it as the end field too (eg, -k8,8 ), as above, because sort , given only a start field index (eg, -k8 ), sorts from the specified field through the remainder of the line .

Answer to question C:

Note: This assumes that input lines may have a variable number of fields, and that on each line it is that line's second-to-last field that should act as the sort field; input comes from file file :

awk '{ printf "%s\t%s\n", $(NF-1), $0 }' file |
  sort -n -k1,1 | # omit -n to perform lexical sorting
    cut -f2-

The awk command extracts each line's second-to-last field and prepends it to the input line on output, separated by a tab.
The result is sorted by the first field (ie, each input line's second-to-last field).
Finally, the artificially prepended sort field is removed again, using cut .

Answer 2

I suggest looking at "man sort".

You will see how to specify a field separator and how to specify the field index that should be used as a key for sorting.

Answer 3

You can use sort -k 2

For example :

echo -e '000 west \n500 east\n500 east\n900 west' | sort -k 2

The result is :

500 east
500 east
900 west
000 west

You can find more informations in the man page of sort. Take a look a the end of the man page. Just before author you have some interesting informations :)

Bye

How do I sort input with a variable number of fields by the second-to-last field?

Question

3 answers

solution1
3 ACCPTED 2015-12-07 00:28:39

solution2
0 2015-12-06 23:46:23

solution3
0 2015-12-07 00:04:40

How do I sort input with a variable number of fields by the second-to-last field?

Question

3 answers

solution1 3 ACCPTED 2015-12-07 00:28:39

solution2 0 2015-12-06 23:46:23

solution3 0 2015-12-07 00:04:40

solution1
3 ACCPTED 2015-12-07 00:28:39

solution2
0 2015-12-06 23:46:23

solution3
0 2015-12-07 00:04:40