简体   繁体   中英

sort unix file by id

I want to sort a unix file by the id column but when I use sort -k4,4 or -k4,4n I do not get the expected result.

The column of interest should be sorted like this:

id1
id2
id3
id4
etc.

Instead it is sorted like this when i do sort -k4,4

id1
id10
id100
id1000
id10000
id10001
etc.

My unix version uses the following sort function:

sort --help
Usage: sort [OPTION]... [FILE]...
Write sorted concatenation of all FILE(s) to standard output.

Mandatory arguments to long options are mandatory for short options too.
Ordering options:

  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < `JAN' < ... < `DEC'
  -n, --numeric-sort          compare according to string numerical value
  -r, --reverse               reverse the result of comparisons

Other options:

  -c, --check               check whether input is sorted; do not sort
  -k, --key=POS1[,POS2]     start a key at POS1, end it at POS2 (origin 1)
  -m, --merge               merge already sorted files; do not sort
  -o, --output=FILE         write result to FILE instead of standard output
  -s, --stable              stabilize sort by disabling last-resort comparison
  -S, --buffer-size=SIZE    use SIZE for main memory buffer
  -t, --field-separator=SEP  use SEP instead of non-blank to blank transition
  -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or /tmp;
                              multiple options specify multiple directories
  -u, --unique              with -c, check for strict ordering;
                              without -c, output only the first of an equal run
  -z, --zero-terminated     end lines with 0 byte, not newline
      --help     display this help and exit
      --version  output version information and exit

Use the -V or --version-sort option for version sort

sort -V -k4,4 file.txt

Example:

$ cat file.txt
id5
id3
id100
id1
id10

Ouput:

$ sort -V file.txt
id1
id3
id5
id10
id100

EDIT:

If your implementation of sort doesn't have the -V option then a work-around using sed to remove id so a numeric sort -n can be done and then replace id back with sed , like this:

sed -E 's/id([0-9]+)/\1/' file.txt | sort -n -k4,4 | sed -E 's/( *)([0-9]+)( *|$)/\1id\2\3/'

Note: this solution is dependent on the data, only works if no columns containing purely numbers are found before the ID column.

As sudo_o has already mentioned , the easiest would be to use --version-sort which does natural sorting of numbers that occur within text.

If your version of sort does not have that option, a hacky way to approach this would be to temporarily remove the "id" prefix before a sort, then replace them. Here's one way, using awk:

awk 'sub("^id", "", $4)' file.txt | sort -k4,4n | awk 'sub("^", "id", $4)'

If your sort supports it, you can also use the syntax FC to use specific characters from a field.

This would sort on field 4, from chars 3 to 10, numerical value:

sort -bn -k 4.3,4.10 file

And this would sort on field 4, from chars 3 to end of field, numerical value:

sort -bn -k 4.3,4 file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM