简体   繁体   English

AWK 根据 TABLE1 替换 TABLE2 中的完整字符串

[英]AWK replace full string in TABLE2 according to TABLE1

I have TABLE1 where first column is a string which should be replaced in the TABLE2 and second column in the TABLE1 is the value which should replace the string.我有 TABLE1,其中第一列是应该在 TABLE2 中替换的字符串,TABLE1 中的第二列是应该替换字符串的值。

TABLE1 looks as this: TABLE1 看起来像这样:

g63. MYL9
g5990. PTC7
g6018. POLYUBQ
g17850. NAA50

Table 2 looks for example like this:表 2 看起来像这样的例子:

PIZI01000001v1 AUGUSTUS gene 751753 768572 0.06 - . g63.
PIZI01000001v1  AUGUSTUS    intron  751969  752021  1   -   .   transcript_id "g63.t1"; gene_id "g63.
PIZI01000001v1 AUGUSTUS gene 16680331 16688019 0.25 + . g630.
PIZI01000001v1  AUGUSTUS    intron  16680415    16683083    0.35    +   .   transcript_id "g630.t1"; gene_id "g630.
PIZI01000001v1 AUGUSTUS gene 16695081 16703546 0.93 + . g631.
PIZI01000001v1 AUGUSTUS gene 16730752 16735366 0.65 + . g632.
PIZI01000008v1 AUGUSTUS gene 1943857 1944177 0.71 - . g6299.

So I assembled the awk command所以我组装了 awk 命令

awk 'FNR==NR { array[$1]==$2; next } { for (i in array) gsub(i, array[i]) }1' TABLE1 TABLE

which works up to the limit that for example with value MYL9 is not replaced only the string g63.它的工作达到了极限,例如,值 MYL9 不会仅替换字符串 g63。 but also the strings like g630, g631, g632... g6300..... and so on.还有 g630、g631、g632...g6300...等字符串。 So the Final table would look like this所以决赛桌看起来像这样

PIZI01000001v1 AUGUSTUS gene 751753 768572 0.06 - . MYL9
PIZI01000001v1  AUGUSTUS    intron  751969  752021  1   -   .   transcript_id "MYL9"; gene_id "MYL9
PIZI01000001v1 AUGUSTUS gene 16680331 16688019 0.25 + . MYL9
PIZI01000001v1  AUGUSTUS    intron  16680415    16683083    0.35    +   .   transcript_id "MYL9t1"; gene_id "MYL9
PIZI01000001v1 AUGUSTUS gene 16695081 16703546 0.93 + . MYL9
PIZI01000001v1 AUGUSTUS gene 16730752 16735366 0.65 + . MYL9
PIZI01000008v1 AUGUSTUS gene 1943857 1944177 0.71 - . g6299.

And I need it to edit jus g63.我需要它来编辑 jus g63。 and not other like g630.而不是其他像g630。 and so on.等等。

I spend quite long time with this and now I have to take pause, so if anybody has an idea whats wrong there, I would appreciate.我在这上面花了很长时间,现在我不得不停下来,所以如果有人知道那里出了什么问题,我将不胜感激。 Thanks谢谢

Your example doesn't really illustrate the problem, but perhaps this is what you're hoping to achieve?您的示例并没有真正说明问题,但也许这就是您希望实现的目标?

head table*
==> table1.txt <==
g63. MYL9
g25. PTC7
g6018. POLYUBQ
g17850. NAA50

==> table2.txt <==
PIZI01000001v1  AUGUSTUS    transcript  1   6991    0.4 -   .   g25.t1
PIZI01000001v1  AUGUSTUS    intron  1   3122    0.71    -   .   transcript_id "g25.t1"; gene_id "g25.";
PIZI01000001v1  AUGUSTUS    CDS 3123    3304    0.76    -   2   transcript_id "g25.t1"; gene_id "g25.";
PIZI01000001v1  AUGUSTUS    intron  3305    4460    1   -   .   transcript_id "g25.t1"; gene_id "g25.";
PIZI01000001v1  AUGUSTUS    CDS 4461    4598    1   -   2   transcript_id "g25.t1"; gene_id "g25.";
PIZI01000001v1  AUGUSTUS    intron  4599    5201    1   -   .   transcript_id "g25.t1"; gene_id "g25.";
PIZI01000001v1  AUGUSTUS    CDS 5202    5342    1   -   2   transcript_id "g25.t1"; gene_id "g25.";
PIZI01000001v1  AUGUSTUS    intron  5343    6978    0.54    -   .   transcript_id "g25.t1"; gene_id "g25.";
PIZI01000001v1  AUGUSTUS    CDS 6979    6991    0.54    -   0   transcript_id "g25.t1";

awk 'NR==FNR{a[$1]=$2; next} NR>FNR{unchanged=$0; gsub(/\"/, ""); gsub(/\;/, ""); if($NF in a) {print unchanged, a[$NF]}}' table1.txt table2.txt
PIZI01000001v1  AUGUSTUS    intron  1   3122    0.71    -   .   transcript_id "g25.t1"; gene_id "g25."; PTC7
PIZI01000001v1  AUGUSTUS    CDS 3123    3304    0.76    -   2   transcript_id "g25.t1"; gene_id "g25."; PTC7
PIZI01000001v1  AUGUSTUS    intron  3305    4460    1   -   .   transcript_id "g25.t1"; gene_id "g25."; PTC7
PIZI01000001v1  AUGUSTUS    CDS 4461    4598    1   -   2   transcript_id "g25.t1"; gene_id "g25."; PTC7
PIZI01000001v1  AUGUSTUS    intron  4599    5201    1   -   .   transcript_id "g25.t1"; gene_id "g25."; PTC7
PIZI01000001v1  AUGUSTUS    CDS 5202    5342    1   -   2   transcript_id "g25.t1"; gene_id "g25."; PTC7
PIZI01000001v1  AUGUSTUS    intron  5343    6978    0.54    -   .   transcript_id "g25.t1"; gene_id "g25."; PTC7

I may have misunderstood the problem though;我可能误解了这个问题; please edit your question if this doesn't solve your issue.如果这不能解决您的问题,请编辑您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM