根据第一列将两个文件合并为一个文件

Question

I have two files, both in the same format -- two columns both containing a number, for example: 我有两个文件，都是相同的格式 - 两列都包含一个数字，例如：

file 1 档案1

1.00    99
2.00    343
3.00    34
...
10.00   343

file 2 档案2

1.00    0.4
2.00    0.5
3.00    0.34
...
10.00   0.9

and i want to generate the following file (using, awk, bash perl): 我想生成以下文件（使用，awk，bash perl）：

1.00    99      0.4 
2.00    343     0.5      
3.00    34      0.34
...
10.00   343     0.9

thanks 谢谢

Answer 1

join file1 file2

Which assumes that the files are sorted on the join field. 假设文件在连接字段中排序。 If they are not, you can do this: 如果不是，您可以这样做：

join <(sort -V file1) <(sort -V file2)

Here's an AWK version (the sort compensates for AWK's non-deterministic array ordering): 这是一个AWK版本（ sort补偿了AWK的非确定性数组排序）：

awk '{a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2 | sort -V

It seems shorter and more readable than the Perl answer. 它似乎比Perl答案更短，更易读。

In gawk 4, you can set the array traversal order: 在gawk 4中，您可以设置数组遍历顺序：

awk 'BEGIN {PROCINFO["sorted_in"] = "@ind_num_asc"} {a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2

and you won't have to use the sort utility. 而且您不必使用sort实用程序。 @ind_num_asc is Index Numeric Ascending. @ind_num_asc是Index Numeric Ascending。 See Controlling Array Traversal and Array Sorting and Using Predefined Array Scanning Orders with gawk . 请参阅使用gawk 控制阵列遍历和阵列排序以及使用预定义阵列扫描顺序。

Note that -V ( --version-sort ) in the sort commands above requires GNU sort from coreutils 7.0 or later. 请注意，上面sort命令中的-V （ - --version-sort ）需要来自coreutils 7.0或更高版本的GNU sort 。 Thanks for @simlev pointing out that it should be used if available. 感谢@simlev指出如果可用的话应该使用它。

Answer 2

A Perl-solution Perl解决方案

perl -anE 'push @{$h{$F[0]}}, $F[1]; END{ say "$_\t$h{$_}->[0]\t$h{$_}->[1]" for sort{$a<=>$b} keys %h }' file_1 file_2 > file_3

Ok, looking at the awk-oneliner this is shorter then my first try and it has the nicer output then the awk-oneliner and it doesn't use the 'pipe sort -n': 好吧，看看awk-oneliner这比我的第一次尝试短，它有更好的输出然后awk-oneliner并且它不使用'管道排序-n'：

perl -anE '$h{$F[0]}="$h{$F[0]}\t$F[1]"; END{say "$_$h{$_}" for sort {$a<=>$b} keys %h}' file_1 file_2

And the one-liners behave different then the join-example if there are entries with no value in the second column in the first file. 如果第一个文件的第二列中没有值的条目，则单行表示与join-example不同。

Answer 3

You can do it with Alacon - command-line utility for Alasql database. 您可以使用Alacon - Alasql数据库的命令行实用程序来完成此操作。

It works with Node.js, so you need to install Node.js and then Alasql package: 它适用于Node.js，因此您需要安装Node.js然后安装Alasql包：

To join two data from tab-separated files you can use the following command: 要从制表符分隔文件中连接两个数据，可以使用以下命令：

> node alacon "SELECT * INTO TSV("main.txt") FROM TSV('data1.txt') data1 
                   JOIN TSV('data2.txt') data2 USING [0]"

This is one very long line. 这是一条很长的路线。 In this example all files have data in "Sheet1" sheets. 在此示例中，所有文件都包含“Sheet1”表中的数据。

根据第一列将两个文件合并为一个文件

问题描述

3 个解决方案

解决方案1
7 已采纳 2010-11-01 19:09:03

解决方案2
2 2010-11-01 21:18:10

解决方案3
0 2014-12-21 16:42:47

根据第一列将两个文件合并为一个文件

问题描述

3 个解决方案

解决方案1 7 已采纳 2010-11-01 19:09:03

解决方案2 2 2010-11-01 21:18:10

解决方案3 0 2014-12-21 16:42:47

解决方案1
7 已采纳 2010-11-01 19:09:03

解决方案2
2 2010-11-01 21:18:10

解决方案3
0 2014-12-21 16:42:47