[英]bash: sorting on same values gives different orders
I have the following two files: 我有以下两个文件:
file1: 文件1:
4 rs10000009 0 71048953 G A
4 rs10000010 0 21618674 C T
4 rs10000011 0 138223055 T C
2 rs1000001 0 50711642 T G
4 rs10000005 0 85161558 G A
12 rs1000000 0 126890980 A G
4 rs10000003 0 57561647 A G
4 rs10000006 0 108826383 C T
4 rs10000007 0 114553253 C A
4 rs10000008 0 172776204 T C
file2: 文件2:
4 rs10000007 C A 0.006562 762
4 rs10000008 T C 0.01575 762
4 rs10000009 G A 0 762
12 rs1000000 A G 0.2388 762
4 rs10000010 C T 0.4921 762
4 rs10000003 A G 0.2992 762
4 rs10000005 G A 0.4409 762
4 rs10000012 G C 0.1417 762
4 rs10000006 C T 0.02625 762
4 rs10000011 T C 0.03675 762
I use sort to sort these files based on column 2, which contains the same values for both files. 我使用sort根据第2列对这些文件进行排序,第2列包含两个文件的相同值。
sort -f -k 2 file1 > file1.sorted
sort -f -k 2 file2 > file2.sorted
However, I get two differently sorted files: 但是,我得到了两个不同排序的文件:
file1.sorted: file1.sorted:
12 rs1000000 0 126890980 A G
4 rs10000003 0 57561647 A G
4 rs10000005 0 85161558 G A
4 rs10000006 0 108826383 C T
4 rs10000007 0 114553253 C A
4 rs10000008 0 172776204 T C
4 rs10000009 0 71048953 G A
4 rs10000010 0 21618674 C T
2 rs1000001 0 50711642 T G
4 rs10000011 0 138223055 T C
file2.sorted: file2.sorted:
4 rs10000003 A G 0.2992 762
4 rs10000005 G A 0.4409 762
4 rs10000006 C T 0.02625 762
4 rs10000007 C A 0.006562 762
4 rs10000008 T C 0.01575 762
4 rs10000009 G A 0 762
12 rs1000000 A G 0.2388 762
4 rs10000010 C T 0.4921 762
4 rs10000011 T C 0.03675 762
4 rs10000012 G C 0.1417 762
What am I missing here? 我在这里想念什么? How do I get these two files to be sorted in the same way? 如何使这两个文件以相同方式排序? It does not matter much to me in which way, as long as I can then use join to join the two files. 哪种方式对我来说都没关系,只要可以使用join来连接两个文件即可。 Many thanks! 非常感谢!
Use -k 2,2
to sort based on 2nd column alone. 使用-k 2,2
仅基于第二列进行排序。 -k 2
means sort starting from 2nd column -k 2
表示从第二列开始排序
$ sort -f -k 2,2 file2
12 rs1000000 A G 0.2388 762
4 rs10000003 A G 0.2992 762
4 rs10000005 G A 0.4409 762
4 rs10000006 C T 0.02625 762
4 rs10000007 C A 0.006562 762
4 rs10000008 T C 0.01575 762
4 rs10000009 G A 0 762
4 rs10000010 C T 0.4921 762
4 rs10000011 T C 0.03675 762
4 rs10000012 G C 0.1417 762
Use -b
option to ignore leading blanks , for ex: sort -bf -k 2,2 file2
使用-b
选项忽略前导空格 ,例如: sort -bf -k 2,2 file2
Further reading: Sort based on the third column 进一步阅读: 根据第三列进行排序
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.