简体   繁体   English

根据第一列将两个文件合并为一个文件

[英]Merging two files into one based on the first column

I have two files, both in the same format -- two columns both containing a number, for example: 我有两个文件,都是相同的格式 - 两列都包含一个数字,例如:

file 1 档案1

1.00    99
2.00    343
3.00    34
...
10.00   343

file 2 档案2

1.00    0.4
2.00    0.5
3.00    0.34
...
10.00   0.9

and i want to generate the following file (using, awk, bash perl): 我想生成以下文件(使用,awk,bash perl):

1.00    99      0.4 
2.00    343     0.5      
3.00    34      0.34
...
10.00   343     0.9

thanks 谢谢

join file1 file2

Which assumes that the files are sorted on the join field. 假设文件在连接字段中排序。 If they are not, you can do this: 如果不是,您可以这样做:

join <(sort -V file1) <(sort -V file2)

Here's an AWK version (the sort compensates for AWK's non-deterministic array ordering): 这是一个AWK版本( sort补偿了AWK的非确定性数组排序):

awk '{a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2 | sort -V

It seems shorter and more readable than the Perl answer. 它似乎比Perl答案更短,更易读。

In gawk 4, you can set the array traversal order: gawk 4中,您可以设置数组遍历顺序:

awk 'BEGIN {PROCINFO["sorted_in"] = "@ind_num_asc"} {a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2

and you won't have to use the sort utility. 而且您不必使用sort实用程序。 @ind_num_asc is Index Numeric Ascending. @ind_num_asc是Index Numeric Ascending。 See Controlling Array Traversal and Array Sorting and Using Predefined Array Scanning Orders with gawk . 请参阅使用gawk 控制阵列遍历和阵列排序以及使用预定义阵列扫描顺序

Note that -V ( --version-sort ) in the sort commands above requires GNU sort from coreutils 7.0 or later. 请注意,上面sort命令中的-V ( - --version-sort )需要来自coreutils 7.0或更高版本的GNU sort Thanks for @simlev pointing out that it should be used if available. 感谢@simlev指出如果可用的话应该使用它。

A Perl-solution Perl解决方案

perl -anE 'push @{$h{$F[0]}}, $F[1]; END{ say "$_\t$h{$_}->[0]\t$h{$_}->[1]" for sort{$a<=>$b} keys %h }' file_1 file_2 > file_3

Ok, looking at the awk-oneliner this is shorter then my first try and it has the nicer output then the awk-oneliner and it doesn't use the 'pipe sort -n': 好吧,看看awk-oneliner这比我的第一次尝试短,它有更好的输出然后awk-oneliner并且它不使用'管道排序-n':

perl -anE '$h{$F[0]}="$h{$F[0]}\t$F[1]"; END{say "$_$h{$_}" for sort {$a<=>$b} keys %h}' file_1 file_2

And the one-liners behave different then the join-example if there are entries with no value in the second column in the first file. 如果第一个文件的第二列中没有值的条目,则单行表示与join-example不同。

You can do it with Alacon - command-line utility for Alasql database. 您可以使用Alacon - Alasql数据库的命令行实用程序来完成此操作。

It works with Node.js, so you need to install Node.js and then Alasql package: 它适用于Node.js,因此您需要安装Node.js然后安装Alasql包:

To join two data from tab-separated files you can use the following command: 要从制表符分隔文件中连接两个数据,可以使用以下命令:

> node alacon "SELECT * INTO TSV("main.txt") FROM TSV('data1.txt') data1 
                   JOIN TSV('data2.txt') data2 USING [0]"

This is one very long line. 这是一条很长的路线。 In this example all files have data in "Sheet1" sheets. 在此示例中,所有文件都包含“Sheet1”表中的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM