繁体   English   中英

bash-用另一个文件中的一行替换文件中的多行

[英]bash - replacing multiple lines in a file with a single line from another file

我到处搜索过,但仍然找不到所需的答案。 我有以下pdb文件(file1):

ATOM      1  N   SER A   1      31.848  -5.217  38.114  1.00 39.55
ATOM      2  CA  SER A   1      31.668  -5.130  36.630  1.00 40.83
ATOM      3  C   SER A   1      30.991  -3.833  36.183  1.00 40.24
ATOM      4  O   SER A   1      30.868  -2.883  36.961  1.00 40.08
ATOM      5  CB  SER A   1      30.854  -6.329  36.118  1.00 41.46
ATOM      6  OG  SER A   1      31.600  -7.531  36.190  1.00 44.54
ATOM      7  N   THR A   2      30.605  -3.796  34.906  1.00 39.92
ATOM      8  CA  THR A   2      29.920  -2.658  34.286  1.00 38.97
ATOM      9  C   THR A   2      28.542  -3.116  33.777  1.00 38.40
ATOM     10  O   THR A   2      27.815  -2.341  33.141  1.00 38.79
ATOM     11  CB  THR A   2      30.734  -2.067  33.086  1.00 39.67
ATOM     12  OG1 THR A   2      31.045  -3.101  32.139  1.00 38.83
ATOM     13  CG2 THR A   2      32.020  -1.403  33.566  1.00 38.83

使用gfortran(file2)进行一些计算后,我还有以下文件:

1  0.14364205034979632
2  0.50527753403393372

我想做的是只要文件1的第6列等于文件2的第1列,就用文件2的第2列替换文件1的第11列。 本质上,输出应该是这样的:

ATOM      1  N   SER A   1      31.848  -5.217  38.114  1.00 0.14364205034979632
ATOM      2  CA  SER A   1      31.668  -5.130  36.630  1.00 0.14364205034979632
ATOM      3  C   SER A   1      30.991  -3.833  36.183  1.00 0.14364205034979632
ATOM      4  O   SER A   1      30.868  -2.883  36.961  1.00 0.14364205034979632
ATOM      5  CB  SER A   1      30.854  -6.329  36.118  1.00 0.14364205034979632
ATOM      6  OG  SER A   1      31.600  -7.531  36.190  1.00 0.14364205034979632
ATOM      7  N   THR A   2      30.605  -3.796  34.906  1.00 0.50527753403393372
ATOM      8  CA  THR A   2      29.920  -2.658  34.286  1.00 0.50527753403393372
ATOM      9  C   THR A   2      28.542  -3.116  33.777  1.00 0.50527753403393372
ATOM     10  O   THR A   2      27.815  -2.341  33.141  1.00 0.50527753403393372
ATOM     11  CB  THR A   2      30.734  -2.067  33.086  1.00 0.50527753403393372
ATOM     12  OG1 THR A   2      31.045  -3.101  32.139  1.00 0.50527753403393372
ATOM     13  CG2 THR A   2      32.020  -1.403  33.566  1.00 0.50527753403393372

我有以下代码:

gawk '
FNR==NR { pdb[NR]=$0; next }
{
    split(pdb[FNR],flds,FS,seps)

    while ( flds[6]==$1 ) {
    flds[11]=$2
    for (i=1;i in flds;i++)
        printf "%s%s", flds[i], seps[i]
    print ""
    }
}
' "file1" "file2" > "output.pdb"

它完成了file1第一行的工作,并保持了间距的一致性。 问题在于它不会继续进行下一行,并且第一行也会永久重复。 有人能帮助我吗?

谢谢! 我请你喝点啤酒:)

我假设file1按第6列排序。

join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2 | column -t

输出:

ATOM  1   N    SER  A  1  31.848  -5.217  38.114  1.00  0.14364205034979632
ATOM  2   CA   SER  A  1  31.668  -5.130  36.630  1.00  0.14364205034979632
ATOM  3   C    SER  A  1  30.991  -3.833  36.183  1.00  0.14364205034979632
ATOM  4   O    SER  A  1  30.868  -2.883  36.961  1.00  0.14364205034979632
ATOM  5   CB   SER  A  1  30.854  -6.329  36.118  1.00  0.14364205034979632
ATOM  6   OG   SER  A  1  31.600  -7.531  36.190  1.00  0.14364205034979632
ATOM  7   N    THR  A  2  30.605  -3.796  34.906  1.00  0.50527753403393372
ATOM  8   CA   THR  A  2  29.920  -2.658  34.286  1.00  0.50527753403393372
ATOM  9   C    THR  A  2  28.542  -3.116  33.777  1.00  0.50527753403393372
ATOM  10  O    THR  A  2  27.815  -2.341  33.141  1.00  0.50527753403393372
ATOM  11  CB   THR  A  2  30.734  -2.067  33.086  1.00  0.50527753403393372
ATOM  12  OG1  THR  A  2  31.045  -3.101  32.139  1.00  0.50527753403393372
ATOM  13  CG2  THR  A  2  32.020  -1.403  33.566  1.00  0.50527753403393372

更新

使用bash的printf:

printf "%s %6.d  %-3s %s %s   %s      %s  %s  %s  %s %s\n" $(join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2)

输出:

ATOM      1  N   SER A   1      31.848  -5.217  38.114  1.00 0.14364205034979632
ATOM      2  CA  SER A   1      31.668  -5.130  36.630  1.00 0.14364205034979632
ATOM      3  C   SER A   1      30.991  -3.833  36.183  1.00 0.14364205034979632
ATOM      4  O   SER A   1      30.868  -2.883  36.961  1.00 0.14364205034979632
ATOM      5  CB  SER A   1      30.854  -6.329  36.118  1.00 0.14364205034979632
ATOM      6  OG  SER A   1      31.600  -7.531  36.190  1.00 0.14364205034979632
ATOM      7  N   THR A   2      30.605  -3.796  34.906  1.00 0.50527753403393372
ATOM      8  CA  THR A   2      29.920  -2.658  34.286  1.00 0.50527753403393372
ATOM      9  C   THR A   2      28.542  -3.116  33.777  1.00 0.50527753403393372
ATOM     10  O   THR A   2      27.815  -2.341  33.141  1.00 0.50527753403393372
ATOM     11  CB  THR A   2      30.734  -2.067  33.086  1.00 0.50527753403393372
ATOM     12  OG1 THR A   2      31.045  -3.101  32.139  1.00 0.50527753403393372
ATOM     13  CG2 THR A   2      32.020  -1.403  33.566  1.00 0.50527753403393372

这是一个非常普遍的问题,我很惊讶您找不到解决方案:

$ awk 'NR==FNR{a[$1]=$2;next} {$11=a[$6]} 1' file2 file1
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372

如果您关心保留空白:

$ awk 'NR==FNR{a[$1]=$2;next} {sub(/[^[:space:]]+[[:space:]]*$/,a[$6])} 1' file2 file1
ATOM      1  N   SER A   1      31.848  -5.217  38.114  1.00 0.14364205034979632
ATOM      2  CA  SER A   1      31.668  -5.130  36.630  1.00 0.14364205034979632
ATOM      3  C   SER A   1      30.991  -3.833  36.183  1.00 0.14364205034979632
ATOM      4  O   SER A   1      30.868  -2.883  36.961  1.00 0.14364205034979632
ATOM      5  CB  SER A   1      30.854  -6.329  36.118  1.00 0.14364205034979632
ATOM      6  OG  SER A   1      31.600  -7.531  36.190  1.00 0.14364205034979632
ATOM      7  N   THR A   2      30.605  -3.796  34.906  1.00 0.50527753403393372
ATOM      8  CA  THR A   2      29.920  -2.658  34.286  1.00 0.50527753403393372
ATOM      9  C   THR A   2      28.542  -3.116  33.777  1.00 0.50527753403393372
ATOM     10  O   THR A   2      27.815  -2.341  33.141  1.00 0.50527753403393372
ATOM     11  CB  THR A   2      30.734  -2.067  33.086  1.00 0.50527753403393372
ATOM     12  OG1 THR A   2      31.045  -3.101  32.139  1.00 0.50527753403393372
ATOM     13  CG2 THR A   2      32.020  -1.403  33.566  1.00 0.50527753403393372

此解决方案特定于gawk(请参阅按内容定义字段 ),并假定file2具有两列,并由单个空格分隔以根据要求获取输出

awk 'BEGIN {FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = "";} FNR==NR{a[$1]=$2; next} {$11=a[$6+0]} {print}' file2 file1 
  • {$11=a[$6+0]}这样在数值上下文中而不是字符串比较中,像“ 1”和“ 2”这样的$6值将与像“ 1”和“ 2”这样的数组中a值匹配(感谢@Ed Morton为解释)

参考文献:

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM