简体   繁体   中英

Sorting multiple columns with delimiter "." in bash

I'm trying to sort a list of text that looks like this:

2023.12.14
2020.10.4
2020.10.1
2020.5.18
2023.14.1
2021.1.1

desired output:

2020.5.18
2020.10.1
2020.10.4
2021.1.1
2023.12.14
2023.14.1

I tried to achieve it with the following command:

sort -t "." -k1,1 -k2,1 -k3,1 sortingTest.txt

With this command I'm trying to sort it by the first "column" (anything before the delimiter "."), and in case that two values are equal than compare the values of the second column etc.

For some reason it only sorts by comparing values of the first column.

What am I missing?

Think that sort -V sorts your data as you want:

$ echo '2023.12.14
> 2020.10.4
> 2020.10.1
> 2020.5.18
> 2023.14.1
> 2021.1.1' | sort -V

2020.5.18
2020.10.1
2020.10.4
2021.1.1
2023.12.14
2023.14.1

You got a few problems, the -k is defined as: -kfield1[,field2] , which means -k2,1 is invalid. You can use -k1 -k2 -k3 to sort on the first three fields.

You properly want to do numeric sort on your fields, this can be enabled with -n see man 1 sort for other numeric sorting options:

$ sort -t. -n -k1 -k2 -k3 file.txt
2020.5.18
2020.10.1
2020.10.4
2021.1.1
2023.12.14
2023.14.1

Might work for you.

In case these are actually versions and not dates, then -V might be sufficient.

AFAIK the only way this can be done with only sort is to use -V . But -V is not POSIX, and is not available in some sort implementations. So here's a POSIX (portable) solution:

awk -F . '{printf "%04d.%02d.%02d\n", $1,$2,$3}' dates-file |
sort |
awk -F . '{printf "%d.%d.%d\n", $1,$2,$3}'

awk transforms the fields to fixed length for sorting, then again transforms sorted data to the original format. This assumes these are dates.

Not as great a solution as I hoped for - this one is specific to either mawk-1 or gawk ; limitations being :

  • middle field and right most field has hard cap of 8^5-1
  • left most field allows up to 8^6-1

The approach is to create a unified sorting key that sorts the same both numerically and in ASCII byte order, while allowing sufficient growth room for each of the 3 fields.

Pure duplicates are printed out on a LIFO basis

The 2 prefix ascii control bytes are used to prevent anywhere along the process flow, both within and beyond awk , attempting a purely numeric at insufficient floating point precision when it sees the hex that's potentially 80-bit in size.

WHINY_USERS=1 {m,g}awk '        # this is a shell param for mawk-1
BEGIN {
        PROCINFO["sorted_in"] = "@ind_str_asc"
        FS = "["(_=+(\
       OFS = "." ) )"]" 
} { __[\
    ____($_)]=$_ } END { for(_ in __) {
                            print __[_] } } 
function ____(___,__,_) {
    return  \
    sprintf("\31\17%.*s%.8X%.4X%.8X",(__="")*split(__,_),
    split(___,_,"[.]"), int((__=_[++__]*(___=(++__^++__\
           )^(__--+__))*___ + _[__]*___ + _[++__]) \
           )/(___+=___),__%___,___*___-___^(!___)-NR) 
}'

|

2020.5.18
2020.10.1
2020.10.4
2021.1.1
2023.12.14
2023.14.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM