[英]How to match two different length and different column text file with header using join command in linux
I have two different length text files A.txt and B.txt我有两个不同长度的文本文件 A.txt 和 B.txt
A.txt looks like: A.txt 看起来像:
ID pos val1 val2 val3
1 2 0.8 0.5 0.6
2 4 0.9 0.6 0.8
3 6 1.0 1.2 1.3
4 8 2.5 2.2 3.4
5 10 3.2 3.4 3.8
B.txt looks like: B.txt 看起来像:
pos category
2 A
4 B
6 A
8 C
10 B
I want to match pos column and in both files and want the output like this我想在两个文件中匹配 pos 列,并且想要像这样的 output
ID catgeory pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
I used the join function join -1 2 -2 1 <(sort -k2 A.txt) <(sort -k1 B.txt) > C.txt
我使用了 join function
join -1 2 -2 1 <(sort -k2 A.txt) <(sort -k1 B.txt) > C.txt
The C.txt comes without a header C.txt 没有 header
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
I want to get output with a header from the join function.我想从连接 function 中获得 output 和 header。 kindly help me out
请帮帮我
Thanks in advance提前致谢
In case you are ok with awk
, could you please try following.如果您对
awk
,请尝试以下操作。 Written and tested with shown samples in GNU awk
.使用 GNU
awk
中的示例编写和测试。
awk 'FNR==NR{a[$1]=$2;next} ($2 in a){$2=a[$2] OFS $2} 1' B.txt A.txt | column -t
Explanation: Adding detailed explanation for above.说明:为上述添加详细说明。
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when B.txt is being read.
a[$1]=$2 ##Creating array a with index of 1st field and value is 2nd field of current line.
next ##next will skip all further statements from here.
}
($2 in a){ ##Checking condition if 2nd field is present in array a then do following.
$2=a[$2] OFS $2 ##Adding array a value along with 2nd field in 2nd field as per output.
}
1 ##1 will print current line.
' B.txt A.txt | column -t ##Mentioning Input_file names and passing awk program output to column to make it look better.
As you requested... It is perfectly possible to get the desired output using just GNU join
:根据您的要求...完全有可能使用 GNU
join
获得所需的 output :
$ join -1 2 -2 1 <(sort -k2 -g A.txt) <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$
The key to getting the correct output is using the sort -g
option, and specifying the join
output column order using the -o
option.获得正确 output 的关键是使用
sort -g
选项,并使用-o
选项指定join
output 列顺序。
To "pretty print" the output, pipe to column -t
要将 output、pipe “漂亮打印”到
column -t
$ join -1 2 -2 1 <(sort -k2 -g A.txt) <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5 | column -t
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.