如何使用 linux 中的連接命令將兩個不同長度和不同列的文本文件與 header 匹配

Question

我有兩個不同長度的文本文件 A.txt 和 B.txt

A.txt 看起來像：

ID  pos  val1  val2  val3
1   2    0.8     0.5   0.6
2   4    0.9     0.6   0.8
3   6    1.0     1.2   1.3
4   8    2.5     2.2   3.4
5   10   3.2     3.4   3.8

B.txt 看起來像：

pos category
2    A
4    B
6    A
8    C
10   B

我想在兩個文件中匹配 pos 列，並且想要像這樣的 output

ID  catgeory  pos  val1  val2  val3
1      A       2    0.8     0.5   0.6
2      B       4    0.9     0.6   0.8
3      A       6    1.0     1.2   1.3
4      C       8    2.5     2.2   3.4
5      B       10   3.2     3.4   3.8

我使用了 join function join -1 2 -2 1 <(sort -k2 A.txt) <(sort -k1 B.txt) > C.txt

C.txt 沒有 header

    1      A       2    0.8     0.5   0.6
    2      B       4    0.9     0.6   0.8
    3      A       6    1.0     1.2   1.3
    4      C       8    2.5     2.2   3.4
    5      B       10   3.2     3.4   3.8

我想從連接 function 中獲得 output 和 header。 請幫幫我

提前致謝

Answer 1

如果您對awk ，請嘗試以下操作。 使用 GNU awk中的示例編寫和測試。

awk 'FNR==NR{a[$1]=$2;next} ($2 in a){$2=a[$2] OFS $2} 1' B.txt A.txt | column -t

說明：為上述添加詳細說明。

awk '                       ##Starting awk program from here.
FNR==NR{                    ##Checking condition FNR==NR which will be TRUE when B.txt is being read.
  a[$1]=$2                  ##Creating array a with index of 1st field and value is 2nd field of current line.
  next                      ##next will skip all further statements from here.
}
($2 in a){                  ##Checking condition if 2nd field is present in array a then do following.
  $2=a[$2] OFS $2           ##Adding array a value along with 2nd field in 2nd field as per output.
}
1                           ##1 will print current line.
' B.txt A.txt | column -t   ##Mentioning Input_file names and passing awk program output to column to make it look better.

Answer 2

根據您的要求...完全有可能使用 GNU join獲得所需的 output ：

$ join -1 2 -2 1 <(sort -k2 -g A.txt)  <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$

獲得正確 output 的關鍵是使用sort -g選項，並使用-o選項指定join output 列順序。

要將 output、pipe “漂亮打印”到column -t

$ join -1 2 -2 1 <(sort -k2 -g A.txt)  <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5 | column -t
ID  category  pos  val1  val2  val3
1   A         2    0.8   0.5   0.6
2   B         4    0.9   0.6   0.8
3   A         6    1.0   1.2   1.3
4   C         8    2.5   2.2   3.4
5   B         10   3.2   3.4   3.8
$

如何使用 linux 中的連接命令將兩個不同長度和不同列的文本文件與 header 匹配

問題描述

2 個解決方案

解決方案1
2 2020-12-08 05:46:37

解決方案2
1 2020-12-08 08:06:37

如何使用 linux 中的連接命令將兩個不同長度和不同列的文本文件與 header 匹配

問題描述

2 個解決方案

解決方案1 2 2020-12-08 05:46:37

解決方案2 1 2020-12-08 08:06:37

解決方案1
2 2020-12-08 05:46:37

解決方案2
1 2020-12-08 08:06:37