基于bash中特定列的排序错误

Question

Hi I am trying this thing but it doesn't work.嗨，我正在尝试这个东西，但它不起作用。

I know that it doesn't work because each line has different number of columns when words are separated by space but can we do the intended job any way.我知道它不起作用，因为当单词用空格分隔时，每行都有不同的列数，但是我们可以以任何方式完成预期的工作。

Answer 1

NOTE: assuming the input file's columns are separated by spaces and not tabs, otherwise dan's comment - sort -nt $'\t' -k3,3 - should suffice注意：假设输入文件的列由空格而不是制表符分隔，否则 dan 的注释 - sort -nt $'\t' -k3,3 - 就足够了

sort allows us to designate the field terminator as well as which fields (and optionally substrings of fields) to sort by. sort允许我们指定字段终止符以及要排序的字段（以及可选的字段子字符串）。

If we set the field delimiter as a linefeed ( \n ) the entire line becomes a single field.如果我们将字段分隔符设置为换行符 ( \n )，则整行将变为单个字段。

From here we can designate a substring of field #1 to sort by;从这里我们可以指定字段 #1 的子字符串作为排序依据； -k1.x,1.y says to sort by field #1 from position x to position y (with the first character of the field/line having a position of 1 ). -k1.x,1.y表示按字段 #1 从位置x到位置y排序（字段/行的第一个字符的位置为1 ）。

Sample input:样本输入：

$ cat animals.txt
         1         2         3         4         5         6
123456789012345678901234567890123456789012345678901234567890
alpaca   Intermediate Perl         2012   Schwatz, Randal
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen

Where:在哪里：

the first 2 lines (the scale) do not exist in the file;文件中不存在前 2 行（比例）； the scale shows us ...规模向我们展示...
the year part of the line runs from position 36 to 39行的year部分从位置36到39

Pulling all of this into a sort call:将所有这些都放入一个sort调用中：

# sort numerically by year (ascending)

$ sort -t$'\n' -k1.36,1.39 -n animals.txt
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen
alpaca   Intermediate Perl         2012   Schwatz, Randal

# sort numerically by year (descending)

$ sort -t$'\n' -k1.36,1.39 -rn animals.txt
alpaca   Intermediate Perl         2012   Schwatz, Randal
horse    Linux in a Nutshell       2009   Siever, Ellen
donkey   Cisco IOS in a Nutshell   2005   Boney, James

NOTE: assumes all lines have the year in the same position (ie, the contents of the file are formatted per a fixed-width scheme)注意：假设所有行的year都在同一位置（即文件的内容按照固定宽度方案进行格式化）

Obviously this approach requires we know the position of the year substring in advance;显然这种方法需要我们提前知道year子串的位置； there are a few ways to determine this position ... one idea, assuming the year column will always be the 1st occurrence of a 4-digit substring ... use bash regex matching and the BASH_REMATCH[] array to determine the length of the line up to the 4-digit year , eg:有几种方法可以确定这个位置......一个想法，假设year列总是第一次出现 4 位子字符串......使用bash正则表达式匹配和BASH_REMATCH[]数组来确定排列到 4 位数的year ，例如：

$ regex="^([^0-9]*)([0-9]{4}).*"
$ [[ $(head -1 animals.txt) =~ $regex ]] && typeset -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="alpaca   Intermediate Perl         2012   Schwatz, Randal" [1]="alpaca   Intermediate Perl         " [2]="2012")

From this we see that the BASH_REMATCH[1] contains the contents of the line up to the year ( 2012 for the alpaca line);从这里我们看到BASH_REMATCH[1]包含该行的内容，直到year （ 2012用于alpaca行）； now we grab the length of BASH_REMATCH[1] and add +1/+3 to get our x and y values:现在我们获取BASH_REMATCH[1]的长度并添加 +1/+3 以获得我们的x和y值：

$ (( x = ${#BASH_REMATCH[1]} + 1 ))
$ (( y = x + 3 ))
$ typeset -p x y
declare -- x="36"
declare -- y="39"

Plugging these variables into our previous sort call:将这些变量插入到我们之前的sort调用中：

# sort numerically by year (ascending)

$ sort -t$'\n' -k1.${x},1.${y} -n animals.txt
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen
alpaca   Intermediate Perl         2012   Schwatz, Randal

# sort numerically by year (descending)

$ sort -t$'\n' -k1.${x},1.${y} -rn animals.txt
alpaca   Intermediate Perl         2012   Schwatz, Randal
horse    Linux in a Nutshell       2009   Siever, Ellen
donkey   Cisco IOS in a Nutshell   2005   Boney, James

NOTE: OP hasn't defined a secondary sort requirement in the case of multiple lines having the same date but it shouldn't be too hard to extend this answer to include a secondary (and tertiary?) sort requirement注意：在多行具有相同日期的情况下，OP 没有定义二级排序要求，但扩展这个答案以包括二级（和三级？）排序要求应该不会太难

Answer 2

Try adding a seperator like a comma, as from there you will be able to use the sort command with the -t argument and specify the given field separator.尝试添加逗号之类的分隔符，因为从那里您将能够使用带有-t参数的sort命令并指定给定的字段分隔符。

To find and replace a character with a seperator I would use cat animals.txt | sed {insert the pattern}要查找并用分隔符替换字符，我会使用cat animals.txt | sed {insert the pattern} cat animals.txt | sed {insert the pattern} . cat animals.txt | sed {insert the pattern} 。

Based on the file you've shared, you could attempt addding the seperator after the first word, and before and after the numerical values.根据您共享的文件，您可以尝试在第一个单词之后以及数值之前和之后添加分隔符。

Answer 3

NOTE: assuming the input file's columns are separated by spaces and not tabs, otherwise dan's comment - sort -nt $'\t' -k3,3 - should suffice注意：假设输入文件的列由空格而不是制表符分隔，否则 dan 的注释 - sort -nt $'\t' -k3,3 - 就足够了

If GNU awk is available we can have awk find the index for the year substring and then sort the output for us.如果GNU awk可用，我们可以让awk找到year子字符串的索引，然后为我们对输出进行排序。

Sample input:样本输入：

$ cat animals.txt
         1         2         3         4         5         6
123456789012345678901234567890123456789012345678901234567890
alpaca   Intermediate Perl         2012   Schwatz, Randal
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen

Where:在哪里：

the first 2 lines (the scale) do not exist in the file;文件中不存在前 2 行（比例）； the scale shows us ...规模向我们展示...
the year part of the line runs from position 36 to 39行的year部分从位置36到39

One GNU awk idea:一个GNU awk想法：

awk '
FNR==1 { x=match($0, /[0-9]{4}/) }                # find index of the "year" substring in the 1st line of input; assumes the "year" is the 1st occurrence of a 4-digit substring
       { arr[substr($0,x,4)][FNR]=$0 }            # populate 2-dimensional array using "year" and row number (FNR) as indexes
END    { PROCINFO["sorted_in"]="@ind_num_asc"     # sort indexes as numbers in "asc"ending order
         for (i in arr)
             for (j in arr[i])
                 print arr[i][j]
       }
' animals.txt

This generates:这会产生：

donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen
alpaca   Intermediate Perl         2012   Schwatz, Randal

If we change the sort order from @ind_num_asc to @ind_num_desc we can generate the output in descending year order, ie:如果我们将排序顺序从@ind_num_asc更改为@ind_num_desc ，我们可以按year降序生成输出，即：

alpaca   Intermediate Perl         2012   Schwatz, Randal
horse    Linux in a Nutshell       2009   Siever, Ellen
donkey   Cisco IOS in a Nutshell   2005   Boney, James

NOTES:笔记：

GNU awk required for multi-dimensional array (aka array of arrays) support多维数组（又名数组数组）支持所需的GNU awk
GNU awk required for the PROCINFO["sorted_in"] feature PROCINFO["sorted_in"]功能需要GNU awk
assumes the entire file can fit into memory (due to storing all lines in the array)假设整个文件可以放入内存（由于将所有行都存储在数组中）

Answer 4

One way to do it is to copy the year to the start of each line with sed , sort the resulting output numerically, and then remove the year at the start of each line:一种方法是使用sed将年份复制到每行的开头，对结果输出进行数字sort ，然后在每行的开头删除年份：

sed 's/^.*[[:space:]]\([12][09][0-9][0-9]\)[[:space:]].*$/\1 &/' animals.txt \
    | sort -n | sed 's/^.....//'

The output with the example animals.txt in the question is:问题中带有示例animals.txt的输出是：

oryx    Writing Word Macros     1999    Roman, Steven
donkey  Cisco IOS in a Nutshell 2005    Boney, James
snail   SSH, The Secure Shell   2005    Barrett, Daniel
horse   Linux in a Nutshell     2009    Sievers, Ellen
python  Programming Python      2010    Lutz, Mark
alpaca  Intermediate Perl       2012    Schwartz, Randal
robin   MySQL High Availability 2014    Bell, Charles

基于bash中特定列的排序错误

问题描述

4 个解决方案

解决方案1
3 2022-06-17 00:36:42

解决方案2
0 2022-06-16 23:54:35

解决方案3
0 2022-06-17 00:54:21

解决方案4
0 2022-06-17 00:58:14

基于bash中特定列的排序错误

问题描述

4 个解决方案

解决方案1 3 2022-06-17 00:36:42

解决方案2 0 2022-06-16 23:54:35

解决方案3 0 2022-06-17 00:54:21

解决方案4 0 2022-06-17 00:58:14

解决方案1
3 2022-06-17 00:36:42

解决方案2
0 2022-06-16 23:54:35

解决方案3
0 2022-06-17 00:54:21

解决方案4
0 2022-06-17 00:58:14