如何使用awk基于公共字段合并文件中的行？

Question

I have a large tab delimited two column file that has the coordinates of many biochemical pathways like this: 我有一个大的制表符分隔两个列文件，其中包含许多生化途径的坐标，如下所示：

A    B
B    D
D    F
F    G
G    I
A    C
C    P
P    R
A    M
M    L
L    X

I want to combine the lines if column 1 in one line is equal to column 2 in another line resulting in the following output: 如果一行中的第1列等于另一行中的第2列，我想组合这些行，从而产生以下输出：

A    B    D    F    G    I
B    D    F    G    I
D    F    G    I
F    G    I
G    I
A    C    P    R
C    P    R
P    R
A    M    L    X
M    L    X
L    X

I would like to use something simple such as an awk 1 liner, does anyone have any idea how I would approach this without writing a shell script? 我想使用一些简单的东西，比如awk 1 liner，有没有人知道如何在不编写shell脚本的情况下接近它？ Any help is appreciated. 任何帮助表示赞赏。 I am trying to get each step and each subsequent step in each pathway. 我试图了解每个步骤中的每个步骤和每个后续步骤。 As these pathways often intersect some steps are shared by other pathways but I want to analyse each separately. 由于这些途径经常交叉，一些步骤由其他途径共享，但我想分别分析每个途径。

I have tried a shell script where I try to grep out any column where $2 = $1 later in the file: 我尝试过一个shell脚本，我尝试在文件后面grep out $ 2 = $ 1的列：

while [ -s test ]; do
    grep -m1 "^" test > i
    cut -f2 i | sed 's/^/"/' | sed 's/$/"/' | sed "s/^/awk \'\$1 == /" | sed "s/$/' test >> i/" > i.sh
    sh i.sh
    perl -p -e 's/\n/\t/g' i >> OUT
    sed '1d' test > i ; mv i test
done

I know that my problem comes from (a) deleting the line and (b) the fact that there are duplicates. 我知道我的问题来自（a）删除该行和（b）存在重复的事实。 I am just not sure how to tackle this. 我只是不确定如何解决这个问题。

Answer 1

Input 输入

$ cat f
A    B
B    D
D    F
F    G
G    I
A    C
C    P
P    R
A    M
M    L
L    X

Output 产量

$ awk '{ 
         for(j=1; j<=NF; j+=2)
         { 
            for(i=j;i<=NF;i+=2)
            {
                printf("%s%s", i==j ? $i OFS : OFS,$(i+1)); 
                if($(i+1)!=$(i+2)){ print ""; break }
            }
          }
        }' RS= OFS="\t" f
A   B   D   F   G   I
B   D   F   G   I
D   F   G   I
F   G   I
G   I
A   C   P   R
C   P   R
P   R
A   M   L   X
M   L   X
L   X

One liner 一个班轮

awk '{ for(j=1; j<=NF; j+=2)for(i=j;i<=NF;i+=2){printf("%s%s", i==j ? $i OFS : OFS,$(i+1)); if($(i+1)!=$(i+2)){ print ""; break }}}' RS= OFS="\t" f

Answer 2

Well, you could put this on one line, but I wouldn't recommend it :) 好吧，你可以把它放在一行，但我不推荐它:)

#!/usr/bin/awk -f

{
  a[NR] = $0

  for(i = 1; i < NR; i++){
    if(a[i] ~ $1"$")
      a[i] = a[i] FS $2
    if(a[i] ~ "^"$1){
      for(j = i; j < NR; j++){
        print a[j]
        delete a[j]
      }   
    }   
  }
}

END{
  for(i = 1; i <= NR; i++)
    if(a[i] != "") 
      print a[i]
}

Answer 3

$ <f.txt tac | awk 'BEGIN{OFS="\t"}{if($2==c1){$2=$2"\t"c2};print $1,$2;c1=$1;c2=$2}' | tac
A       B       D       F       G       I
B       D       F       G       I
D       F       G       I
F       G       I
G       I
A       C       P       R
C       P       R
P       R
A       M       L       X
M       L       X
L       X

如何使用awk基于公共字段合并文件中的行？

问题描述

3 个解决方案

解决方案1
3 已采纳 2017-02-17 18:02:38

解决方案2
0 2017-02-17 17:25:43

解决方案3
0 2017-02-17 17:50:16

如何使用awk基于公共字段合并文件中的行？

问题描述

3 个解决方案

解决方案1 3 已采纳 2017-02-17 18:02:38

解决方案2 0 2017-02-17 17:25:43

解决方案3 0 2017-02-17 17:50:16

解决方案1
3 已采纳 2017-02-17 18:02:38

解决方案2
0 2017-02-17 17:25:43

解决方案3
0 2017-02-17 17:50:16