[英]Merging two files into one based on the first column
I have two files, both in the same format -- two columns both containing a number, for example: 我有两个文件,都是相同的格式 - 两列都包含一个数字,例如:
file 1 档案1
1.00 99
2.00 343
3.00 34
...
10.00 343
file 2 档案2
1.00 0.4
2.00 0.5
3.00 0.34
...
10.00 0.9
and i want to generate the following file (using, awk, bash perl): 我想生成以下文件(使用,awk,bash perl):
1.00 99 0.4
2.00 343 0.5
3.00 34 0.34
...
10.00 343 0.9
thanks 谢谢
join file1 file2
Which assumes that the files are sorted on the join field. 假设文件在连接字段中排序。 If they are not, you can do this: 如果不是,您可以这样做:
join <(sort -V file1) <(sort -V file2)
Here's an AWK version (the sort
compensates for AWK's non-deterministic array ordering): 这是一个AWK版本( sort
补偿了AWK的非确定性数组排序):
awk '{a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2 | sort -V
It seems shorter and more readable than the Perl answer. 它似乎比Perl答案更短,更易读。
In gawk
4, you can set the array traversal order: 在gawk
4中,您可以设置数组遍历顺序:
awk 'BEGIN {PROCINFO["sorted_in"] = "@ind_num_asc"} {a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2
and you won't have to use the sort
utility. 而且您不必使用sort
实用程序。 @ind_num_asc
is Index Numeric Ascending. @ind_num_asc
是Index Numeric Ascending。 See Controlling Array Traversal and Array Sorting and Using Predefined Array Scanning Orders with gawk . 请参阅使用gawk 控制阵列遍历和阵列排序以及使用预定义阵列扫描顺序 。
Note that -V
( --version-sort
) in the sort
commands above requires GNU sort
from coreutils 7.0 or later. 请注意,上面sort
命令中的-V
( - --version-sort
)需要来自coreutils 7.0或更高版本的GNU sort
。 Thanks for @simlev pointing out that it should be used if available. 感谢@simlev指出如果可用的话应该使用它。
A Perl-solution Perl解决方案
perl -anE 'push @{$h{$F[0]}}, $F[1]; END{ say "$_\t$h{$_}->[0]\t$h{$_}->[1]" for sort{$a<=>$b} keys %h }' file_1 file_2 > file_3
Ok, looking at the awk-oneliner this is shorter then my first try and it has the nicer output then the awk-oneliner and it doesn't use the 'pipe sort -n': 好吧,看看awk-oneliner这比我的第一次尝试短,它有更好的输出然后awk-oneliner并且它不使用'管道排序-n':
perl -anE '$h{$F[0]}="$h{$F[0]}\t$F[1]"; END{say "$_$h{$_}" for sort {$a<=>$b} keys %h}' file_1 file_2
And the one-liners behave different then the join-example if there are entries with no value in the second column in the first file. 如果第一个文件的第二列中没有值的条目,则单行表示与join-example不同。
You can do it with Alacon - command-line utility for Alasql database. 您可以使用Alacon - Alasql数据库的命令行实用程序来完成此操作。
It works with Node.js, so you need to install Node.js and then Alasql package: 它适用于Node.js,因此您需要安装Node.js然后安装Alasql包:
To join two data from tab-separated files you can use the following command: 要从制表符分隔文件中连接两个数据,可以使用以下命令:
> node alacon "SELECT * INTO TSV("main.txt") FROM TSV('data1.txt') data1
JOIN TSV('data2.txt') data2 USING [0]"
This is one very long line. 这是一条很长的路线。 In this example all files have data in "Sheet1" sheets. 在此示例中,所有文件都包含“Sheet1”表中的数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.