[英]bash - replacing multiple lines in a file with a single line from another file
I have searched everywhere but I still don't have the answer that I'm looking for. 我到处搜索过,但仍然找不到所需的答案。 I have the following pdb file (file1): 我有以下pdb文件(file1):
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 39.55
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 40.83
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 40.24
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 40.08
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 41.46
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 44.54
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 39.92
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 38.97
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 38.40
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 38.79
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 39.67
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 38.83
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 38.83
I also have the following file after some calculation using gfortran (file2): 使用gfortran(file2)进行一些计算后,我还有以下文件:
1 0.14364205034979632
2 0.50527753403393372
What I'd like to do is replace column 11 of file1 with column 2 of file2 for as long as column 6 of file1 is equal to column 1 of file2. 我想做的是只要文件1的第6列等于文件2的第1列,就用文件2的第2列替换文件1的第11列。 Essentially, the output should be like this: 本质上,输出应该是这样的:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
I have the following code: 我有以下代码:
gawk '
FNR==NR { pdb[NR]=$0; next }
{
split(pdb[FNR],flds,FS,seps)
while ( flds[6]==$1 ) {
flds[11]=$2
for (i=1;i in flds;i++)
printf "%s%s", flds[i], seps[i]
print ""
}
}
' "file1" "file2" > "output.pdb"
It gets the job done for the first line of file1 and it keeps the spacing consistent. 它完成了file1第一行的工作,并保持了间距的一致性。 The problem is that it doesn't proceed to the next lines and the first line is also repeated perpetually. 问题在于它不会继续进行下一行,并且第一行也会永久重复。 Could anyone be so kind to help me out? 有人能帮助我吗?
Thanks! 谢谢! I'd treat you for some beer :) 我请你喝点啤酒:)
I assume that file1 is sorted by column 6. 我假设file1按第6列排序。
join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2 | column -t
Output: 输出:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632 ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632 ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632 ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632 ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632 ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632 ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372 ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372 ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372 ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372 ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372 ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372 ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
Update : 更新 :
With bash's printf: 使用bash的printf:
printf "%s %6.d %-3s %s %s %s %s %s %s %s %s\n" $(join -1 6 -2 1 file1 file2 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,2.2)
Output: 输出:
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632 ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632 ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632 ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632 ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632 ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632 ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372 ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372 ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372 ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372 ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372 ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372 ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
This is an incredibly common problem, I'm surprised you couldn't find a solution: 这是一个非常普遍的问题,我很惊讶您找不到解决方案:
$ awk 'NR==FNR{a[$1]=$2;next} {$11=a[$6]} 1' file2 file1
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
If you care about preserving the white space: 如果您关心保留空白:
$ awk 'NR==FNR{a[$1]=$2;next} {sub(/[^[:space:]]+[[:space:]]*$/,a[$6])} 1' file2 file1
ATOM 1 N SER A 1 31.848 -5.217 38.114 1.00 0.14364205034979632
ATOM 2 CA SER A 1 31.668 -5.130 36.630 1.00 0.14364205034979632
ATOM 3 C SER A 1 30.991 -3.833 36.183 1.00 0.14364205034979632
ATOM 4 O SER A 1 30.868 -2.883 36.961 1.00 0.14364205034979632
ATOM 5 CB SER A 1 30.854 -6.329 36.118 1.00 0.14364205034979632
ATOM 6 OG SER A 1 31.600 -7.531 36.190 1.00 0.14364205034979632
ATOM 7 N THR A 2 30.605 -3.796 34.906 1.00 0.50527753403393372
ATOM 8 CA THR A 2 29.920 -2.658 34.286 1.00 0.50527753403393372
ATOM 9 C THR A 2 28.542 -3.116 33.777 1.00 0.50527753403393372
ATOM 10 O THR A 2 27.815 -2.341 33.141 1.00 0.50527753403393372
ATOM 11 CB THR A 2 30.734 -2.067 33.086 1.00 0.50527753403393372
ATOM 12 OG1 THR A 2 31.045 -3.101 32.139 1.00 0.50527753403393372
ATOM 13 CG2 THR A 2 32.020 -1.403 33.566 1.00 0.50527753403393372
This solution is gawk specific (see Defining Fields by Content ) and assumes file2 to have two columns separated by single space to get output as per requirement 此解决方案特定于gawk(请参阅按内容定义字段 ),并假定file2具有两列,并由单个空格分隔以根据要求获取输出
awk 'BEGIN {FPAT = "([[:space:]]*[[:alnum:][:punct:][:digit:]]+)"; OFS = "";} FNR==NR{a[$1]=$2; next} {$11=a[$6+0]} {print}' file2 file1
{$11=a[$6+0]}
so that values of $6
like " 1" and " 2" will match against values in array a
like "1" and "2" in numeric context instead of string comparison (Thanks @Ed Morton for the explanation) {$11=a[$6+0]}
这样在数值上下文中而不是字符串比较中,像“ 1”和“ 2”这样的$6
值将与像“ 1”和“ 2”这样的数组中a
值匹配(感谢@Ed Morton为解释) References: 参考文献:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.