[英]Join two files together (AWK)
I have one problem. 我有一个问题。 I would like to merge two files together.
我想将两个文件合并在一起。 Where:
哪里:
File 1: 文件1:
chr**1 10000** rs200132 A C 100.000
chr**2 20000** rs5000 C G 80.000
File 2: 档案2:
rs200132 **1:10000** A 800 200 Nmf
rs210111 **1:10000** G 200 800 VFC
rs310000 **1:10000** C 100 500 tff
rs50001 **2:20000** T 500 100 jpp
rs60000 **2:20000** A 1000 10 jkl
Output: 输出:
chr**1 10000** rs200132 A A C 800 200 Nmf
chr**1 10000** rs210111 G A C 200 800 VFC
chr**1 10000** rs310000 C A C 100 500 tff
chr**2 20000** rs50001 T C G 500 100 jpp
chr**2 20000** rs60000 A C G 1000 10 jkl
Than, from the first file are the marks number after "chr" and in the second column the number. 然后,从第一个文件开始,在“ chr”之后是标记数字,在第二列中是数字。 The same marks are in the second file, but there is in second column like 1:10000.
相同的标记在第二个文件中,但在第二列中如1:10000。 I would like to join this two file but for the first file will be much more joined rows (fe: for the first row from the first file will be three rows from the second file.) Thank you
我想加入这两个文件,但是对于第一个文件,将有更多的连接行(例如:对于第一个文件的第一行将是第二个文件的三行。)谢谢
perl -lane'
BEGIN{ $x=pop; %h = map{ $_->[1] => $_ } map [split], <>; @ARGV=$x }
$F[1] =~ s/.+?://;
$t = $h{$F[1]};
print join " ", @$t[0,1], @F[0,2], @$t[3,4], @F[3..5];
' file1 file2
output 输出
chr**1 10000** rs200132 A A C 800 200 Nmf
chr**1 10000** rs210111 G A C 200 800 VFC
chr**1 10000** rs310000 C A C 100 500 tff
chr**2 20000** rs50001 T C G 500 100 jpp
chr**2 20000** rs60000 A C G 1000 10 jkl
You can use this awk
, 您可以使用此
awk
,
awk 'NR==FNR{a[$2]=$1;b[$2]=$4" "$5;next} {sub(/.*:/,"",$2); $3=$1" "$3" "b[$2]; $1=a[$2];}1' file1 file2
Test: 测试:
sat:~# awk 'NR==FNR{a[$2]=$1;b[$2]=$4" "$5;next} {sub(/.*:/,"",$2); $3=$1" "$3" "b[$2]; $1=a[$2];}1' file1 file2
chr**1 10000** rs200132 A A C 800 200 Nmf
chr**1 10000** rs210111 G A C 200 800 VFC
chr**1 10000** rs310000 C A C 100 500 tff
chr**2 20000** rs50001 T C G 500 100 jpp
chr**2 20000** rs60000 A C G 1000 10 jkl
Here's another way using Perl: 这是使用Perl的另一种方式:
perl -lane '
if (@ARGV) {
($x = $F[0]) =~ s/[^\d]*//;
$h{$x}{$F[1]} = [ @F[0,1,3,4] ]
}
else {
@t = split(":", $F[1]);
$r = $h{$t[0]}{$t[1]};
print join(" ", @$r[0,1], @F[0,2], @$r[2,3], @F[3..5])
}
' file1 file2 | column -t
Results: 结果:
chr1 10000 rs200132 A A C 800 200 Nmf
chr1 10000 rs210111 G A C 200 800 VFC
chr1 10000 rs310000 C A C 100 500 tff
chr2 20000 rs50001 T C G 500 100 jpp
chr2 20000 rs60000 A C G 1000 10 jkl
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.