简体   繁体   English

基于bash中特定列的排序错误

[英]Error in sorting based on a specific column in bash

Hi I am trying this thing but it doesn't work.嗨,我正在尝试这个东西,但它不起作用。 在此处输入图像描述

I know that it doesn't work because each line has different number of columns when words are separated by space but can we do the intended job any way.我知道它不起作用,因为当单词用空格分隔时,每行都有不同的列数,但是我们可以以任何方式完成预期的工作。

NOTE: assuming the input file's columns are separated by spaces and not tabs, otherwise dan's comment - sort -nt $'\t' -k3,3 - should suffice注意:假设输入文件的列由空格而不是制表符分隔,否则 dan 的注释 - sort -nt $'\t' -k3,3 - 就足够了


sort allows us to designate the field terminator as well as which fields (and optionally substrings of fields) to sort by. sort允许我们指定字段终止符以及要排序的字段(以及可选的字段子字符串)。

If we set the field delimiter as a linefeed ( \n ) the entire line becomes a single field.如果我们将字段分隔符设置为换行符 ( \n ),则整行将变为单个字段。

From here we can designate a substring of field #1 to sort by;从这里我们可以指定字段 #1 的子字符串作为排序依据; -k1.x,1.y says to sort by field #1 from position x to position y (with the first character of the field/line having a position of 1 ). -k1.x,1.y表示按字段 #1 从位置x到位置y排序(字段/行的第一个字符的位置为1 )。

Sample input:样本输入:

$ cat animals.txt
         1         2         3         4         5         6
123456789012345678901234567890123456789012345678901234567890
alpaca   Intermediate Perl         2012   Schwatz, Randal
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen

Where:在哪里:

  • the first 2 lines (the scale) do not exist in the file;文件中不存在前 2 行(比例); the scale shows us ...规模向我们展示...
  • the year part of the line runs from position 36 to 39行的year部分从位置3639

Pulling all of this into a sort call:将所有这些都放入一个sort调用中:

# sort numerically by year (ascending)

$ sort -t$'\n' -k1.36,1.39 -n animals.txt
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen
alpaca   Intermediate Perl         2012   Schwatz, Randal

# sort numerically by year (descending)

$ sort -t$'\n' -k1.36,1.39 -rn animals.txt
alpaca   Intermediate Perl         2012   Schwatz, Randal
horse    Linux in a Nutshell       2009   Siever, Ellen
donkey   Cisco IOS in a Nutshell   2005   Boney, James

NOTE: assumes all lines have the year in the same position (ie, the contents of the file are formatted per a fixed-width scheme)注意:假设所有行的year都在同一位置(即文件的内容按照固定宽度方案进行格式化)

Obviously this approach requires we know the position of the year substring in advance;显然这种方法需要我们提前知道year子串的位置; there are a few ways to determine this position ... one idea, assuming the year column will always be the 1st occurrence of a 4-digit substring ... use bash regex matching and the BASH_REMATCH[] array to determine the length of the line up to the 4-digit year , eg:有几种方法可以确定这个位置......一个想法,假设year列总是第一次出现 4 位子字符串......使用bash正则表达式匹配和BASH_REMATCH[]数组来确定排列到 4 位数的year ,例如:

$ regex="^([^0-9]*)([0-9]{4}).*"
$ [[ $(head -1 animals.txt) =~ $regex ]] && typeset -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="alpaca   Intermediate Perl         2012   Schwatz, Randal" [1]="alpaca   Intermediate Perl         " [2]="2012")

From this we see that the BASH_REMATCH[1] contains the contents of the line up to the year ( 2012 for the alpaca line);从这里我们看到BASH_REMATCH[1]包含该行的内容,直到year2012用于alpaca行); now we grab the length of BASH_REMATCH[1] and add +1/+3 to get our x and y values:现在我们获取BASH_REMATCH[1]的长度并添加 +1/+3 以获得我们的xy值:

$ (( x = ${#BASH_REMATCH[1]} + 1 ))
$ (( y = x + 3 ))
$ typeset -p x y
declare -- x="36"
declare -- y="39"

Plugging these variables into our previous sort call:将这些变量插入到我们之前的sort调用中:

# sort numerically by year (ascending)

$ sort -t$'\n' -k1.${x},1.${y} -n animals.txt
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen
alpaca   Intermediate Perl         2012   Schwatz, Randal

# sort numerically by year (descending)

$ sort -t$'\n' -k1.${x},1.${y} -rn animals.txt
alpaca   Intermediate Perl         2012   Schwatz, Randal
horse    Linux in a Nutshell       2009   Siever, Ellen
donkey   Cisco IOS in a Nutshell   2005   Boney, James

NOTE: OP hasn't defined a secondary sort requirement in the case of multiple lines having the same date but it shouldn't be too hard to extend this answer to include a secondary (and tertiary?) sort requirement注意:在多行具有相同日期的情况下,OP 没有定义二级排序要求,但扩展这个答案以包括二级(和三级?)排序要求应该不会太难

Try adding a seperator like a comma, as from there you will be able to use the sort command with the -t argument and specify the given field separator.尝试添加逗号之类的分隔符,因为从那里您将能够使用带有-t参数的sort命令并指定给定的字段分隔符。

To find and replace a character with a seperator I would use cat animals.txt | sed {insert the pattern}要查找并用分隔符替换字符,我会使用cat animals.txt | sed {insert the pattern} cat animals.txt | sed {insert the pattern} . cat animals.txt | sed {insert the pattern}

Based on the file you've shared, you could attempt addding the seperator after the first word, and before and after the numerical values.根据您共享的文件,您可以尝试在第一个单词之后以及数值之前和之后添加分隔符。

NOTE: assuming the input file's columns are separated by spaces and not tabs, otherwise dan's comment - sort -nt $'\t' -k3,3 - should suffice注意:假设输入文件的列由空格而不是制表符分隔,否则 dan 的注释 - sort -nt $'\t' -k3,3 - 就足够了


If GNU awk is available we can have awk find the index for the year substring and then sort the output for us.如果GNU awk可用,我们可以让awk找到year子字符串的索引,然后为我们对输出进行排序。

Sample input:样本输入:

$ cat animals.txt
         1         2         3         4         5         6
123456789012345678901234567890123456789012345678901234567890
alpaca   Intermediate Perl         2012   Schwatz, Randal
donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen

Where:在哪里:

  • the first 2 lines (the scale) do not exist in the file;文件中不存在前 2 行(比例); the scale shows us ...规模向我们展示...
  • the year part of the line runs from position 36 to 39行的year部分从位置3639

One GNU awk idea:一个GNU awk想法:

awk '
FNR==1 { x=match($0, /[0-9]{4}/) }                # find index of the "year" substring in the 1st line of input; assumes the "year" is the 1st occurrence of a 4-digit substring
       { arr[substr($0,x,4)][FNR]=$0 }            # populate 2-dimensional array using "year" and row number (FNR) as indexes
END    { PROCINFO["sorted_in"]="@ind_num_asc"     # sort indexes as numbers in "asc"ending order
         for (i in arr)
             for (j in arr[i])
                 print arr[i][j]
       }
' animals.txt

This generates:这会产生:

donkey   Cisco IOS in a Nutshell   2005   Boney, James
horse    Linux in a Nutshell       2009   Siever, Ellen
alpaca   Intermediate Perl         2012   Schwatz, Randal

If we change the sort order from @ind_num_asc to @ind_num_desc we can generate the output in descending year order, ie:如果我们将排序顺序从@ind_num_asc更改为@ind_num_desc ,我们可以按year降序生成输出,即:

alpaca   Intermediate Perl         2012   Schwatz, Randal
horse    Linux in a Nutshell       2009   Siever, Ellen
donkey   Cisco IOS in a Nutshell   2005   Boney, James

NOTES:笔记:

  • GNU awk required for multi-dimensional array (aka array of arrays) support多维数组(又名数组数组)支持所需的GNU awk
  • GNU awk required for the PROCINFO["sorted_in"] feature PROCINFO["sorted_in"]功能需要GNU awk
  • assumes the entire file can fit into memory (due to storing all lines in the array)假设整个文件可以放入内存(由于将所有行都存储在数组中)

One way to do it is to copy the year to the start of each line with sed , sort the resulting output numerically, and then remove the year at the start of each line:一种方法是使用sed将年份复制到每行的开头,对结果输出进行数字sort ,然后在每行的开头删除年份:

sed 's/^.*[[:space:]]\([12][09][0-9][0-9]\)[[:space:]].*$/\1 &/' animals.txt \
    | sort -n | sed 's/^.....//'

The output with the example animals.txt in the question is:问题中带有示例animals.txt的输出是:

oryx    Writing Word Macros     1999    Roman, Steven
donkey  Cisco IOS in a Nutshell 2005    Boney, James
snail   SSH, The Secure Shell   2005    Barrett, Daniel
horse   Linux in a Nutshell     2009    Sievers, Ellen
python  Programming Python      2010    Lutz, Mark
alpaca  Intermediate Perl       2012    Schwartz, Randal
robin   MySQL High Availability 2014    Bell, Charles

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM