简体   繁体   English

从文件BASH排序列

[英]Sorting Columns From File BASH

I have the following shell script that reads in data from a file inputted at the command line. 我有以下shell脚本从命令行输入的文件中读取数据。 The file is a matrix of numbers, and I need to separate the file by columns and then sort the columns. 该文件是一个数字矩阵,我需要按列分隔文件,然后对列进行排序。 Right now I can read the file and output the individual columns but I am getting lost on how to sort. 现在我可以读取文件并输出各列,但我对如何排序感到迷茫。 I have inputted a sort statement, but it only sorts the first column. 我输入了一个排序语句,但它只排序第一列。

EDIT: I have decided to take another route and actual transpose the matrix to turn the columns into rows. 编辑:我已经决定采取另一种方式并实际转置矩阵将列转换为行。 Since I have to later calculate the mean and median and have already successfully done this for the file row-wise earlier in the script - it was suggested to me to try and "spin" the matrix if you will to turn the columns into rows. 因为我必须稍后计算平均值和中位数,并且已经在脚本中早先成功地对文件执行此操作 - 如果您要将列转换为行,建议我尝试“旋转”矩阵。

Here is my UPDATED code 这是我的更新代码

     declare -a col=( )
     read -a line < "$1"
     numCols=${#line[@]}                          # save number of columns

     index=0
     while read -a line ; do
     for (( colCount=0; colCount<${#line[@]}; colCount++ )); do
      col[$index]=${line[$colCount]}
      ((index++))
     done
     done < "$1"

     for (( width = 0; width < numCols; width++ )); do
      for (( colCount = width; colCount < ${#col[@]}; colCount += numCols )    ); do

       printf "%s\t" ${col[$colCount]}
     done
    printf "\n"
   done

This gives me the following output: 这给了我以下输出:

    1 9 6 3 3 6
    1 3 7 6 4 4
    1 4 8 8 2 4
    1 5 9 9 1 7
    1 5 7 1 4 7

Though I'm now looking for: 虽然我现在正在寻找:

    1 3 3 6 6 9
    1 3 4 4 6 7
    1 2 4 4 8 8
    1 1 5 7 9 9
    1 1 4 5 7 7

To try and sort the data, I have tried the following: 为了尝试对数据进行排序,我尝试了以下方法:

    sortCol=${col[$colCount]}
    eval col[$colCount]='($(sort <<<"${'$sortCol'[*]}"))'

Also: (which is how I sorted the row after reading in from line) 另外:(这是我从行读入后对行进行排序的方式)

    sortCol=( $(printf '%s\t' "${col[$colCount]}" | sort -n) )

If you could provide any insight on this, it would be greatly appreciated! 如果您能对此提供任何见解,我们将不胜感激!

Note, as mentioned in the comments, a pure bash solution isn't pretty. 请注意,正如评论中所提到的,纯粹的bash解决方案并不漂亮。 There are a number of ways to do it, but this is probably the most straight forward. 有很多方法可以做到,但这可能是最直接的。 The following requires reading all values per line into the array, and saving the matrix stride so it can be transposed to read all column values into a row matrix and sorted. 以下内容要求将每行的所有值读入数组,并保存矩阵stride以便可以将其转换为将所有列值读入行矩阵并进行排序。 All sorted columns are inserted into new row matrix a2 . 所有排序的列都插入到新的行矩阵a2 Transposing that row matrix yields your original matrix back in column sort order. 转置该行矩阵会以列排序顺序返回原始矩阵。

Note this will work for any rank of column matrix in your file. 请注意,这适用于文件中任何列列的矩阵。

#!/bin/bash

test -z "$1" && {           ## validate number of input
    printf "insufficient input. usage:  %s <filename>\n" "${0//*\//}"
    exit 1;
}

test -r "$1" || {           ## validate file was readable
    printf "error: file not readable '%s'. usage:  %s <filename>\n" "$1" "${0//*\//}"
    exit 1;
}

## function: my sort integer array - accepts array and returns sorted array
## Usage: array=( "$(msia ${array[@]})" )
msia() {
    local a=( "$@" )
    local sz=${#a[@]}
    local _tmp
    [[ $sz -lt 2 ]] && { echo "Warning: array not passed to fxn 'msia'"; return 1; }
    for((i=0;i<$sz;i++)); do
        for((j=$((sz-1));j>i;j--)); do
        [[ ${a[$i]} -gt ${a[$j]} ]] && {
            _tmp=${a[$i]}
            a[$i]=${a[$j]}
            a[$j]=$_tmp
        }
        done
    done
    echo ${a[@]}
    unset _tmp
    unset sz
    return 0
}

declare -a a1               ## declare arrays and matrix variables
declare -a a2
declare -i cnt=0
declare -i stride=0
declare -i sz=0

while read line; do         ## read all lines into array
    a1+=( $line );
    (( cnt == 0 )) && stride=${#a1[@]}  ## calculate matrix stride
    (( cnt++ ))
done < "$1"

sz=${#a1[@]}                ## calculate matrix size
                            ## print original array
printf "\noriginal array:\n\n"
for ((i = 0; i < sz; i += stride)); do
    for ((j = 0; j < stride; j++)); do
        printf " %s" ${a1[i+j]}
    done
    printf "\n"
done

                            ## sort columns from stride array
for ((j = 0; j < stride; j++)); do
    for ((i = 0; i < sz; i += stride)); do
        arow+=( ${a1[i+j]} )
    done
    a2+=( $(msia ${arow[@]}) )  ## create sorted array
    unset arow
done
                            ## print the sorted array
printf "\nsorted array:\n\n"
for ((j = 0; j < cnt; j++)); do
    for ((i = 0; i < sz; i += cnt)); do
        printf " %s" ${a2[i+j]}
    done
    printf "\n"
done

exit 0

Output 产量

$ bash sort_cols2.sh dat/matrix.txt

original array:

 1 1 1 1 1
 9 3 4 5 5
 6 7 8 9 7
 3 6 8 9 1
 3 4 2 1 4
 6 4 4 7 7

sorted array:

 1 1 1 1 1
 3 3 2 1 1
 3 4 4 5 4
 6 4 4 7 5
 6 6 8 9 7
 9 7 8 9 7

Awk script Awk脚本

awk '
{for(i=1;i<=NF;i++)a[i]=a[i]" "$i}      #Add to column array
END{
        for(i=1;i<=NF;i++){
                split(a[i],b)          #Split column
                x=asort(b)             #sort column
                for(j=1;j<=x;j++){     #loop through sort
                        d[j]=d[j](d[j]~/./?" ":"")b[j]  #Recreate lines
                }
        }
for(i=1;i<=NR;i++)print d[i]          #Print lines
}' file

Output 产量

1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7

Here's my entry in this little exercise. 这是我参加这个小练习的内容。 Should handle an arbitrary number of columns. 应该处理任意数量的列。 I assume they're space-separated: 我认为它们是空间分隔的:

#!/bin/bash

linenumber=0
while read line; do
        i=0
        # Create an array for each column.
        for number in $line; do
                [ $linenumber == 0 ] && eval "array$i=()"
                eval "array$i+=($number)"
                (( i++ ))
        done    
        (( linenumber++ ))
done <$1
IFS=$'\n'
# Sort each column
for j in $(seq 0 $i ); do
        thisarray=array$j
        eval array$j='($(sort <<<"${'$thisarray'[*]}"))'
done    
# Print each array's 0'th entry, then 1, then 2, etc...
for k in $(seq 0 ${#array0[@]}); do
        for j in $(seq 0 $i ); do
                eval 'printf ${array'$j'['$k']}" "'
        done    
        echo "" 
done

Not bash but i think this python code worths a look showing how this task can be achieved using built-in functions. 不是bash但我认为这个python代码值得一看,展示如何使用内置函数实现此任务。

From the interpreter : interpreter

$ cat matrix.txt 
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7

$ python
Python 2.7.3 (default, Jun 19 2012, 17:11:17) 
[GCC 4.4.3] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> f = open('./matrix.txt')
>>> for row in zip(*[sorted(list(a)) 
               for a in zip(*[a.split() for a in f.readlines()])]):
...    print ' '.join(row)
... 
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM