简体   繁体   English

Unix连接两个以上的文件

[英]Unix join on more than two files

I have three files, each with an ID and a value. 我有三个文件,每个文件都有一个ID和一个值。

sdt5z@fir-s:~/test$ ls
a.txt  b.txt  c.txt
sdt5z@fir-s:~/test$ cat a.txt 
id1 1
id2 2
id3 3
sdt5z@fir-s:~/test$ cat b.txt 
id1 4
id2 5
id3 6
sdt5z@fir-s:~/test$ cat c.txt 
id1 7
id2 8
id3 9

I want to create a file that looks like this... 我想创建一个看起来像这样的文件......

id1 1 4 7
id2 2 5 8
id3 3 6 9

...preferably using a single command. ...最好使用单个命令。

I'm aware of the join and paste commands. 我知道连接和粘贴命令。 Paste will duplicate the id column each time: 粘贴将每次复制id列:

sdt5z@fir-s:~/test$ paste a.txt b.txt c.txt 
id1 1   id1 4   id1 7
id2 2   id2 5   id2 8
id3 3   id3 6   id3 9

Join works well, but for only two files at a time: 加入效果很好,但一次只有两个文件:

sdt5z@fir-s:~/test$ join a.txt b.txt 
id1 1 4
id2 2 5
id3 3 6
sdt5z@fir-s:~/test$ join a.txt b.txt c.txt 
join: extra operand `c.txt'
Try `join --help' for more information.

I'm also aware that paste can take STDIN as one of the arguments by using "-". 我也知道粘贴可以使用“ - ”将STDIN作为参数之一。 Eg, I can replicate the join command using: 例如,我可以使用以下命令复制join命令:

sdt5z@fir-s:~/test$ cut -f2 b.txt | paste a.txt -
id1 1   4
id2 2   5
id3 3   6

But I'm still not sure how to modify this to accomodate three files. 但我仍然不知道如何修改它以容纳三个文件。

Since I'm doing this inside a perl script, I know I can do something like putting this inside a foreach loop, something like join file1 file2 > tmp1, join tmp1 file3 > tmp2, etc. But this gets messy, and I would like to do this with a one-liner. 因为我在perl脚本中执行此操作,所以我知道我可以执行类似将其置于foreach循环中的内容,例如join file1 file2> tmp1,join tmp1 file3> tmp2等等。但这会变得混乱,我想用一个班轮做这件事。

join a.txt b.txt|join - c.txt

应该足够了

Since you're doing it inside a Perl script , is there any specific reason you're NOT doing the work in Perl as opposed to spawning in shell? 既然你是在Perl脚本中进行的 ,那么你是否有任何特定的理由不在Perl中进行工作而不是在shell中生成?

Something like (NOT TESTED! caveat emptor): 像(没有测试!警告经纪人):

use File::Slurp; # Slurp the files in if they aren't too big
my @files = qw(a.txt b.txt c.txt);
my %file_data = map ($_ => [ read_file($_) ] ) @files;
my @id_orders;
my %data = ();
my $first_file = 1;
foreach my $file (@files) {
    foreach my $line (@{ $file_data{$file} }) {
        my ($id, $value) = split(/\s+/, $line);
        push @id_orders, $id if $first_file;
        $data{$id} ||= [];
        push @{ $data{$id} }, $value;
    }
    $first_file = 0;
}
foreach my $id (@id_orders) {
    print "$d " . join(" ", @{ $data{$id} }) . "\n";
}
pr -m -t -s\  file1.txt file2.txt|gawk '{print $1"\t"$2"\t"$3"\t"$4}'> finalfile.txt

Considering file1 and file2 have 2 columns and 1 and 2 represents columns from file1 and 3 and 4 represents columns from file2. 考虑到file1和file2有2列,1和2表示来自file1和3的列,4表示来自file2的列。

You can also print any column from each file in this way and it will take any number of files as input. 您也可以通过这种方式打印每个文件中的任何列,并将任意数量的文件作为输入。 If your file1 has 5 columns for example, then $6 will be the first column of the file2. 例如,如果file1有5列,那么$ 6将是file2的第一列。

perl -lanE'$h{$F[0]} .= " $F[1]" END{say $_.$h{$_} foreach keys %h}' *.txt

Should work, can't test it as I'm answering from my mobile. 应该工作,不能测试它,因为我正在从我的手机回答。 You also could sort the output if you put a sort between foreach and keys . 如果在foreachkeys之间进行sort ,也可以对输出进行sort

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM