简体   繁体   English

将两个文件连接在一起(AWK)

[英]Join two files together (AWK)

I have one problem. 我有一个问题。 I would like to merge two files together. 我想将两个文件合并在一起。 Where: 哪里:

File 1: 文件1:

 chr**1  10000**   rs200132 A  C  100.000
 chr**2  20000**   rs5000   C  G   80.000

File 2: 档案2:

 rs200132  **1:10000**  A   800   200  Nmf 
 rs210111  **1:10000**  G   200   800  VFC 
 rs310000  **1:10000**  C   100   500  tff
 rs50001   **2:20000**  T   500   100  jpp
 rs60000   **2:20000**  A   1000   10  jkl

Output: 输出:

 chr**1  10000**  rs200132  A  A  C   800   200  Nmf
 chr**1  10000**  rs210111  G  A  C   200   800  VFC
 chr**1  10000**  rs310000  C  A  C   100   500  tff
 chr**2  20000**  rs50001   T  C  G   500   100  jpp
 chr**2  20000**  rs60000   A  C  G   1000   10  jkl

Than, from the first file are the marks number after "chr" and in the second column the number. 然后,从第一个文件开始,在“ chr”之后是标记数字,在第二列中是数字。 The same marks are in the second file, but there is in second column like 1:10000. 相同的标记在第二个文件中,但在第二列中如1:10000。 I would like to join this two file but for the first file will be much more joined rows (fe: for the first row from the first file will be three rows from the second file.) Thank you 我想加入这两个文件,但是对于第一个文件,将有更多的连接行(例如:对于第一个文件的第一行将是第二个文件的三行。)谢谢

perl -lane'
  BEGIN{ $x=pop; %h = map{ $_->[1] => $_ } map [split], <>; @ARGV=$x }
  $F[1] =~ s/.+?://;
  $t = $h{$F[1]};
  print join " ", @$t[0,1], @F[0,2], @$t[3,4], @F[3..5];
' file1 file2

output 输出

chr**1 10000** rs200132 A A C 800 200 Nmf
chr**1 10000** rs210111 G A C 200 800 VFC
chr**1 10000** rs310000 C A C 100 500 tff
chr**2 20000** rs50001 T C G 500 100 jpp
chr**2 20000** rs60000 A C G 1000 10 jkl

You can use this awk , 您可以使用此awk

awk 'NR==FNR{a[$2]=$1;b[$2]=$4" "$5;next} {sub(/.*:/,"",$2); $3=$1" "$3" "b[$2]; $1=a[$2];}1' file1 file2

Test: 测试:

sat:~# awk 'NR==FNR{a[$2]=$1;b[$2]=$4" "$5;next} {sub(/.*:/,"",$2); $3=$1" "$3" "b[$2]; $1=a[$2];}1' file1 file2
chr**1 10000** rs200132 A A C 800 200 Nmf
chr**1 10000** rs210111 G A C 200 800 VFC
chr**1 10000** rs310000 C A C 100 500 tff
chr**2 20000** rs50001 T C G 500 100 jpp
chr**2 20000** rs60000 A C G 1000 10 jkl

Here's another way using Perl: 这是使用Perl的另一种方式:

perl -lane '
    if (@ARGV) {
        ($x = $F[0]) =~ s/[^\d]*//;
        $h{$x}{$F[1]} = [ @F[0,1,3,4] ]
    }
    else {
        @t = split(":", $F[1]);
        $r = $h{$t[0]}{$t[1]};
        print join(" ", @$r[0,1], @F[0,2], @$r[2,3], @F[3..5])
    }
' file1 file2 | column -t

Results: 结果:

chr1  10000  rs200132  A  A  C  800   200  Nmf
chr1  10000  rs210111  G  A  C  200   800  VFC
chr1  10000  rs310000  C  A  C  100   500  tff
chr2  20000  rs50001   T  C  G  500   100  jpp
chr2  20000  rs60000   A  C  G  1000  10   jkl

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM