简体   繁体   中英

How do I sort input with a variable number of fields by the second-to-last field?

Editor's note: The original title of the question mentioned tabs as the field separators.

In a text such as

500 east 23rd avenue Toronto 2 890 400000 1 
900 west yellovillage blvd Mississauga 3 800 600090 3

how would you sort in ascending order of the second to last column?

Editor's note: The OP later provided another sample input line, 500 Jackson Blvd Toronto 3 700 40000 2 , which contains only 8 whitespace-separated input fields (compared to the 9 above), revealing the need to deal with a variable number of fields in the input.

Note: There are several, potentially separate questions:

Update : Question C was the relevant one.

  • Question A: As implied by the question's title only : how can you use the tab character ( \\t ) as the field separator?

  • Question B: How can you sort input by the second-to-last field, without knowing that field's specific index up front, given a fixed number of fields?

  • Question C: How can you sort input by the second-to-last field, without knowing that field's respective index up front, given a variable number of fields?


Answer to question A:

sort 's -t option allows you to specify a field separator. By default, sort uses any run of line-interior whitespace as the separator.

Assuming Bash, Ksh, or Zsh, you can use an ANSI C-quoted string ( $'...' ) to specify a single tab as the field separator ( $'\\t' ):

sort -t $'\t' -n -k8,8 file # -n sorts numerically; omit for lexical sorting

Answer to question B:

Note: This assumes that all input lines have the same number of fields, and that input comes from file file :

 # Determine the index of the next-to-last column, based on the first
 # line, using Awk:
 nextToLastColNdx=$(head -n 1 file | awk -F '\t' '{ print NF - 1 }')

 # Sort numerically by the next-to-last column (omit -n to sort lexically):
 sort -t $'\t' -n -k$nextToLastColNdx,$nextToLastColNdx file

Note: To sort by a single field, always specify it as the end field too (eg, -k8,8 ), as above, because sort , given only a start field index (eg, -k8 ), sorts from the specified field through the remainder of the line .


Answer to question C:

Note: This assumes that input lines may have a variable number of fields, and that on each line it is that line's second-to-last field that should act as the sort field; input comes from file file :

awk '{ printf "%s\t%s\n", $(NF-1), $0 }' file |
  sort -n -k1,1 | # omit -n to perform lexical sorting
    cut -f2-
  • The awk command extracts each line's second-to-last field and prepends it to the input line on output, separated by a tab.
  • The result is sorted by the first field (ie, each input line's second-to-last field).
  • Finally, the artificially prepended sort field is removed again, using cut .

I suggest looking at "man sort".

You will see how to specify a field separator and how to specify the field index that should be used as a key for sorting.

You can use sort -k 2

For example :

echo -e '000 west \n500 east\n500 east\n900 west' | sort -k 2

The result is :

500 east
500 east
900 west
000 west

You can find more informations in the man page of sort. Take a look a the end of the man page. Just before author you have some interesting informations :)

Bye

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM