简体   繁体   English

如何使用 linux 中的连接命令将两个不同长度和不同列的文本文件与 header 匹配

[英]How to match two different length and different column text file with header using join command in linux

I have two different length text files A.txt and B.txt我有两个不同长度的文本文件 A.txt 和 B.txt

A.txt looks like: A.txt 看起来像:

ID  pos  val1  val2  val3
1   2    0.8     0.5   0.6
2   4    0.9     0.6   0.8
3   6    1.0     1.2   1.3
4   8    2.5     2.2   3.4
5   10   3.2     3.4   3.8

B.txt looks like: B.txt 看起来像:

pos category
2    A
4    B
6    A
8    C
10   B

I want to match pos column and in both files and want the output like this我想在两个文件中匹配 pos 列,并且想要像这样的 output

ID  catgeory  pos  val1  val2  val3
1      A       2    0.8     0.5   0.6
2      B       4    0.9     0.6   0.8
3      A       6    1.0     1.2   1.3
4      C       8    2.5     2.2   3.4
5      B       10   3.2     3.4   3.8

I used the join function join -1 2 -2 1 <(sort -k2 A.txt) <(sort -k1 B.txt) > C.txt我使用了 join function join -1 2 -2 1 <(sort -k2 A.txt) <(sort -k1 B.txt) > C.txt

The C.txt comes without a header C.txt 没有 header

    1      A       2    0.8     0.5   0.6
    2      B       4    0.9     0.6   0.8
    3      A       6    1.0     1.2   1.3
    4      C       8    2.5     2.2   3.4
    5      B       10   3.2     3.4   3.8

I want to get output with a header from the join function.我想从连接 function 中获得 output 和 header。 kindly help me out请帮帮我

Thanks in advance提前致谢

In case you are ok with awk , could you please try following.如果您对awk ,请尝试以下操作。 Written and tested with shown samples in GNU awk .使用 GNU awk中的示例编写和测试。

awk 'FNR==NR{a[$1]=$2;next} ($2 in a){$2=a[$2] OFS $2} 1' B.txt A.txt | column -t

Explanation: Adding detailed explanation for above.说明:为上述添加详细说明。

awk '                       ##Starting awk program from here.
FNR==NR{                    ##Checking condition FNR==NR which will be TRUE when B.txt is being read.
  a[$1]=$2                  ##Creating array a with index of 1st field and value is 2nd field of current line.
  next                      ##next will skip all further statements from here.
}
($2 in a){                  ##Checking condition if 2nd field is present in array a then do following.
  $2=a[$2] OFS $2           ##Adding array a value along with 2nd field in 2nd field as per output.
}
1                           ##1 will print current line.
' B.txt A.txt | column -t   ##Mentioning Input_file names and passing awk program output to column to make it look better.

As you requested... It is perfectly possible to get the desired output using just GNU join :根据您的要求...完全有可能使用 GNU join获得所需的 output :

$ join -1 2 -2 1 <(sort -k2 -g A.txt)  <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$

The key to getting the correct output is using the sort -g option, and specifying the join output column order using the -o option.获得正确 output 的关键是使用sort -g选项,并使用-o选项指定join output 列顺序。

To "pretty print" the output, pipe to column -t要将 output、pipe “漂亮打印”到column -t

$ join -1 2 -2 1 <(sort -k2 -g A.txt)  <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5 | column -t
ID  category  pos  val1  val2  val3
1   A         2    0.8   0.5   0.6
2   B         4    0.9   0.6   0.8
3   A         6    1.0   1.2   1.3
4   C         8    2.5   2.2   3.4
5   B         10   3.2   3.4   3.8
$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在linux中的文件中加入具有不同模式的两行? - How can I join two lines with different patterns in file in linux? 基于Linux中的公共列合并两个不同长度的文件 - merge two files of different length based on common column in linux 如何在不同的Linux发行版上处理不同的头文件位置? - How to handle different header file location on different Linux distributions? Linux脚本:如何使用匹配模式将文本拆分为不同的文件 - Linux script: how to split a text into different files with match pattern 如何使用Linux Shell命令计算文本文件中某些模式的完全匹配? - How to count exact match of certain patterns in a text file using linux shell command? 我可以使用join命令来连接两个在不同列上具有相似性的文件吗? - Can I use join command to join two files that have similarities on different column? 使用linux命令连接两个文件 - join two files using linux command 如何使用命令行替换linux中具有不同文件扩展名的多个子文件夹中的多个文件中的字符串 - How to replace a string in multiple files in multiple subfolders with different file extensions in linux using command line 如何使用Linux命令在Fasta文件中提取标头的一部分 - how to extract a part of header in Fasta file by using Linux command Linux结合了两个不同的文本文件 - Linux combine two different text files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM