简体   繁体   English

用不均匀的空格对空格分隔的列表进行排序

[英]Sorting a space delimited list with uneven spaces

I have a space delimited list that has an uneven amount of spaces in what would be the first column.我有了这是第一列中的空间大小不均空格分隔的列表。 I want to reverse sort this by the first number that appears after its string.我想按其字符串后出现的第一个数字对其进行反向排序。 I need to do this using bash commands.我需要使用 bash 命令来做到这一点。

Example:例子:

Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US

Would turn into:会变成:


Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US

I've tried doing sort -nr to see what happens and it reverse sorts the list, but respective to it's alphabetized order.我试过执行sort -nr来查看会发生什么,它对列表进行反向排序,但相对于它的字母顺序。 I want to sort based on all values.我想根据所有值进行排序。

The trick is that I must keep it space delimited.诀窍是我必须保持空间分隔。 What's the best way to do this using bash?使用 bash 执行此操作的最佳方法是什么?

I must keep it space delimited必须保持空间分隔

You mean, the result has to be space delimited again, right?您的意思是,结果必须再次以空格分隔,对吗? During processing, you can transform the input however you like.在处理过程中,您可以随意转换输入。

Assuming you know a character that never appears in your file otherwise, delimit the value you want to sort with by that character using sed , then sort by that value, then remove the additional delimiters again.假设您知道一个从未出现在您的文件中的字符,请使用sed分隔要按该字符排序的值,然后按该值排序,然后再次删除其他分隔符。
Here we use a tab to delimit the key for sorting.这里我们使用制表符来分隔排序的键。

sed -E 's/ ([0-9]+\.[0-9]+) / \t\1\t /' | sort -t $'\t' -k2,2n | tr -d \\t

This is basically a Schwartzian transform .这基本上是一个施瓦兹变换

here's a short ruby program:这是一个简短的 ruby​​ 程序:

ruby -e '
    puts IO.readlines(ARGV.shift, chomp: true)
        .map {|line|
            fields = line.split
            [fields[0..(fields.size - 9)].join(" ")] + fields[-8 .. -1]
        }
        .sort_by {|row| row[1]}
        .map {|row| row.join(" ")}
        .join("\n")
' file

I would use GNU AWK for this as follows, let file.txt content be我将为此使用 GNU AWK ,如下所示,让file.txt内容为

Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US

then然后

awk 'BEGIN{FPAT="[0-9]*[.][0-9]*";PROCINFO["sorted_in"]="@ind_num_asc"}{arr[$1]=$0}END{for(i in arr){print arr[i]}}' file.txt

output输出

Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US

Explanation: I inform GNU AWK that field is 0 or more digits followed by literal dot ( [.] ) followed by 0 or more digits (note: I assume that there will always be dot in first number and never dot in column with name) and that array traversal should be treat-indices-as-numbers-ascending which is one of Predefined Array Scanning Orders .说明:我通知 GNU AWK该字段是 0 个或多个数字,后跟文字点( [.] ),然后是 0 个或多个数字(注意:我假设第一个数字中总是有点,而名称列中永远不会有点)并且该数组遍历应该是 Treat-indices-as-numbers-ascending 这是Predefined Array Scanning Orders 之一 For each line I add to array pair with key being first number ( $1 ) and value being whole line ( $0 ).对于每一行,我添加到数组对中,键是第一个数字( $1 ),值是整行( $0 )。 After going through all lines I print values from array arr with order which observe selected array traversal.在完成所有行后,我从数组arr print值,并按照观察选定数组遍历的顺序进行print

(tested in gawk 4.2.1) (在 gawk 4.2.1 中测试)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM