[英]Error in sorting based on a specific column in bash
NOTE: assuming the input file's columns are separated by spaces and not tabs, otherwise dan's comment - sort -nt $'\t' -k3,3
- should suffice注意:假设输入文件的列由空格而不是制表符分隔,否则 dan 的注释 -
sort -nt $'\t' -k3,3
- 就足够了
sort
allows us to designate the field terminator as well as which fields (and optionally substrings of fields) to sort by. sort
允许我们指定字段终止符以及要排序的字段(以及可选的字段子字符串)。
If we set the field delimiter as a linefeed ( \n
) the entire line becomes a single field.如果我们将字段分隔符设置为换行符 (
\n
),则整行将变为单个字段。
From here we can designate a substring of field #1 to sort by;从这里我们可以指定字段 #1 的子字符串作为排序依据;
-k1.x,1.y
says to sort by field #1 from position x
to position y
(with the first character of the field/line having a position of 1
). -k1.x,1.y
表示按字段 #1 从位置x
到位置y
排序(字段/行的第一个字符的位置为1
)。
Sample input:样本输入:
$ cat animals.txt
1 2 3 4 5 6
123456789012345678901234567890123456789012345678901234567890
alpaca Intermediate Perl 2012 Schwatz, Randal
donkey Cisco IOS in a Nutshell 2005 Boney, James
horse Linux in a Nutshell 2009 Siever, Ellen
Where:在哪里:
year
part of the line runs from position 36
to 39
year
部分从位置36
到39
Pulling all of this into a sort
call:将所有这些都放入一个
sort
调用中:
# sort numerically by year (ascending)
$ sort -t$'\n' -k1.36,1.39 -n animals.txt
donkey Cisco IOS in a Nutshell 2005 Boney, James
horse Linux in a Nutshell 2009 Siever, Ellen
alpaca Intermediate Perl 2012 Schwatz, Randal
# sort numerically by year (descending)
$ sort -t$'\n' -k1.36,1.39 -rn animals.txt
alpaca Intermediate Perl 2012 Schwatz, Randal
horse Linux in a Nutshell 2009 Siever, Ellen
donkey Cisco IOS in a Nutshell 2005 Boney, James
NOTE: assumes all lines have the year
in the same position (ie, the contents of the file are formatted per a fixed-width scheme)注意:假设所有行的
year
都在同一位置(即文件的内容按照固定宽度方案进行格式化)
Obviously this approach requires we know the position of the year
substring in advance;显然这种方法需要我们提前知道
year
子串的位置; there are a few ways to determine this position ... one idea, assuming the year
column will always be the 1st occurrence of a 4-digit substring ... use bash
regex matching and the BASH_REMATCH[]
array to determine the length of the line up to the 4-digit year
, eg:有几种方法可以确定这个位置......一个想法,假设
year
列总是第一次出现 4 位子字符串......使用bash
正则表达式匹配和BASH_REMATCH[]
数组来确定排列到 4 位数的year
,例如:
$ regex="^([^0-9]*)([0-9]{4}).*"
$ [[ $(head -1 animals.txt) =~ $regex ]] && typeset -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="alpaca Intermediate Perl 2012 Schwatz, Randal" [1]="alpaca Intermediate Perl " [2]="2012")
From this we see that the BASH_REMATCH[1]
contains the contents of the line up to the year
( 2012
for the alpaca
line);从这里我们看到
BASH_REMATCH[1]
包含该行的内容,直到year
( 2012
用于alpaca
行); now we grab the length of BASH_REMATCH[1]
and add +1/+3 to get our x
and y
values:现在我们获取
BASH_REMATCH[1]
的长度并添加 +1/+3 以获得我们的x
和y
值:
$ (( x = ${#BASH_REMATCH[1]} + 1 ))
$ (( y = x + 3 ))
$ typeset -p x y
declare -- x="36"
declare -- y="39"
Plugging these variables into our previous sort
call:将这些变量插入到我们之前的
sort
调用中:
# sort numerically by year (ascending)
$ sort -t$'\n' -k1.${x},1.${y} -n animals.txt
donkey Cisco IOS in a Nutshell 2005 Boney, James
horse Linux in a Nutshell 2009 Siever, Ellen
alpaca Intermediate Perl 2012 Schwatz, Randal
# sort numerically by year (descending)
$ sort -t$'\n' -k1.${x},1.${y} -rn animals.txt
alpaca Intermediate Perl 2012 Schwatz, Randal
horse Linux in a Nutshell 2009 Siever, Ellen
donkey Cisco IOS in a Nutshell 2005 Boney, James
NOTE: OP hasn't defined a secondary sort requirement in the case of multiple lines having the same date but it shouldn't be too hard to extend this answer to include a secondary (and tertiary?) sort requirement注意:在多行具有相同日期的情况下,OP 没有定义二级排序要求,但扩展这个答案以包括二级(和三级?)排序要求应该不会太难
Try adding a seperator like a comma, as from there you will be able to use the sort
command with the -t
argument and specify the given field separator.尝试添加逗号之类的分隔符,因为从那里您将能够使用带有
-t
参数的sort
命令并指定给定的字段分隔符。
To find and replace a character with a seperator I would use cat animals.txt | sed {insert the pattern}
要查找并用分隔符替换字符,我会使用
cat animals.txt | sed {insert the pattern}
cat animals.txt | sed {insert the pattern}
. cat animals.txt | sed {insert the pattern}
。
Based on the file you've shared, you could attempt addding the seperator after the first word, and before and after the numerical values.根据您共享的文件,您可以尝试在第一个单词之后以及数值之前和之后添加分隔符。
NOTE: assuming the input file's columns are separated by spaces and not tabs, otherwise dan's comment - sort -nt $'\t' -k3,3
- should suffice注意:假设输入文件的列由空格而不是制表符分隔,否则 dan 的注释 -
sort -nt $'\t' -k3,3
- 就足够了
If GNU awk
is available we can have awk
find the index for the year
substring and then sort the output for us.如果
GNU awk
可用,我们可以让awk
找到year
子字符串的索引,然后为我们对输出进行排序。
Sample input:样本输入:
$ cat animals.txt
1 2 3 4 5 6
123456789012345678901234567890123456789012345678901234567890
alpaca Intermediate Perl 2012 Schwatz, Randal
donkey Cisco IOS in a Nutshell 2005 Boney, James
horse Linux in a Nutshell 2009 Siever, Ellen
Where:在哪里:
year
part of the line runs from position 36
to 39
year
部分从位置36
到39
One GNU awk
idea:一个
GNU awk
想法:
awk '
FNR==1 { x=match($0, /[0-9]{4}/) } # find index of the "year" substring in the 1st line of input; assumes the "year" is the 1st occurrence of a 4-digit substring
{ arr[substr($0,x,4)][FNR]=$0 } # populate 2-dimensional array using "year" and row number (FNR) as indexes
END { PROCINFO["sorted_in"]="@ind_num_asc" # sort indexes as numbers in "asc"ending order
for (i in arr)
for (j in arr[i])
print arr[i][j]
}
' animals.txt
This generates:这会产生:
donkey Cisco IOS in a Nutshell 2005 Boney, James
horse Linux in a Nutshell 2009 Siever, Ellen
alpaca Intermediate Perl 2012 Schwatz, Randal
If we change the sort order from @ind_num_asc
to @ind_num_desc
we can generate the output in descending year
order, ie:如果我们将排序顺序从
@ind_num_asc
更改为@ind_num_desc
,我们可以按year
降序生成输出,即:
alpaca Intermediate Perl 2012 Schwatz, Randal
horse Linux in a Nutshell 2009 Siever, Ellen
donkey Cisco IOS in a Nutshell 2005 Boney, James
NOTES:笔记:
GNU awk
required for multi-dimensional array (aka array of arrays) supportGNU awk
GNU awk
required for the PROCINFO["sorted_in"]
feature PROCINFO["sorted_in"]
功能需要GNU awk
One way to do it is to copy the year to the start of each line with sed
, sort
the resulting output numerically, and then remove the year at the start of each line:一种方法是使用
sed
将年份复制到每行的开头,对结果输出进行数字sort
,然后在每行的开头删除年份:
sed 's/^.*[[:space:]]\([12][09][0-9][0-9]\)[[:space:]].*$/\1 &/' animals.txt \
| sort -n | sed 's/^.....//'
The output with the example animals.txt
in the question is:问题中带有示例
animals.txt
的输出是:
oryx Writing Word Macros 1999 Roman, Steven
donkey Cisco IOS in a Nutshell 2005 Boney, James
snail SSH, The Secure Shell 2005 Barrett, Daniel
horse Linux in a Nutshell 2009 Sievers, Ellen
python Programming Python 2010 Lutz, Mark
alpaca Intermediate Perl 2012 Schwartz, Randal
robin MySQL High Availability 2014 Bell, Charles
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.