简体   繁体   中英

What linux commands can I use to sort columns in a tab-separated text file?

I need to compare two versions of the same file. Both are tab-separated and have this form:

<filename1><tab><Marker11><tab><Marker12>...
<filename2><tab><Marker21><tab><Marker22><tab><Marker22>...

So each row has a different number of markers (the number varies between 1 and 10) and they all come from a small set of possible markers. So a file looks like this:

fileX<tab>Z<tab>M<tab>A
fileB<tab>Y
fileM<tab>M<tab>C<tab>B<tab>Y

What I need is:

  1. Sort the file by rows
  2. Sort the markers in each row so that they are in alphabetical order

So for the example above, the result would be

fileB<tab>Y
fileM<tab>B<tab>C<tab>M<tab>Y
fileX<tab>A<tab>M<tab>Z

It's easy to do #1 using sort but how do I do #2?

UPDATE: It's not a duplicate of this post since my rows are of different length and I need each rows (the entries after the filename) sorted individually, ie the only column that gets preserved is the first one.

awk solution:

awk 'BEGIN{ FS=OFS="\t"; PROCINFO["sorted_in"]="@ind_str_asc" }
     { split($0,b,FS); delete b[1]; asort(b); r=""; 
         for(i in b) r=(r!="")? r OFS b[i] : b[i]; a[$1] = r 
     }
     END{ for(i in a) print i,a[i] }' file

The output:

fileB   Y
fileM   B   C   M   Y
fileX   A   M   Z

  • PROCINFO["sorted_in"]="@ind_str_asc" - sort mode

  • split($0,b,FS); - split the line into array b by FS (field separator)

  • asort(b) - sort marker values

All you need is:

awk '
{ for (i=2;i<=NF;i++) arr[$1][$i] }
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (i in arr) {
        printf "%s", i
        for (j in arr[i]) {
            printf "%s%s, OFS, arr[i][j]
        }
        print ""
    }
}
' file

The above uses GNU awk for true multi-dimensional arrays plus sorted_in

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM