用具有匹配键的哈希值替换文件中的文本

Question

I would like to replace all words in a file matching the keys of my hash with corresponding values. 我想用相应的值替换匹配我的哈希键的文件中的所有单词。

Hash: 杂凑：

$VAR1 = {
    'asmbl_1'  => 'TCONS_00000046',
    'asmbl_2'  => 'TCONS_00000014',
    'asmbl_16' => 'MELO3C000012',
}

File: 文件：

CM3.6.1_CONTIG30890 assembler   transcript  187 1568    .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30890 assembler   exon    187 251 .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30898 assembler   exon    1339    2793    .   -   .   gene_id "PASA_cluster_2"; transcript_id "align_id:184318|asmbl_2";

Desired output: 所需的输出：

CM3.6.1_CONTIG30890 assembler   transcript  187 1568    .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30890 assembler   exon    187 251 .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30898 assembler   exon    1339    2793    .   -   .   gene_id "PASA_cluster_2"; transcript_id "align_id:184318|TCONS_00000014";

I'm looking for a straightforward way to do this, preferably in Perl, since I'm writing a script in Perl. 我正在寻找一种简单的方法来执行此操作，最好是在Perl中，因为我正在用Perl编写脚本。

Approaches: 方法：

Read the file line by line, extract the key from the file, match this key in hash and replace it by the value. 逐行读取文件，从文件中提取密钥，将其与哈希匹配，然后将其替换为值。
Read hash pair by pair, open file, read line by line and replace matches. 逐对读取哈希，打开文件，逐行读取并替换匹配项。

(What is the difference between these both methods?) （这两种方法有什么区别？）

Read hash pair by pair and call bash " sed -i '/key/value/' ". 逐对读取哈希对，并调用bash“ sed -i '/key/value/' ”。 A bit ugly, I would prefer to do all in Perl. 有点难看，我宁愿在Perl中做所有事情。

Answer 1

There's a nice trick I like, that basically involves building a regex and using that to capture and match your regex: 我喜欢一个不错的技巧，基本上涉及构建一个正则表达式，并使用它来捕获和匹配您的正则表达式：

use strict;
use warnings;

my %replace = (
    'asmbl_1'  => 'TCONS_00000046',
    'asmbl_2'  => 'TCONS_00000014',
    'asmbl_16' => 'MELO3C000012',
);

my $search = join( "|", map {quotemeta} sort { length ($b) <=> length ($a) } keys %replace );
$search = qr/\b($search)\b/;

while (<>) {
    s/$search/$replace{$1}/g;
    print;
}

Something like that produces the desired output. 诸如此类的东西会产生所需的输出。 (Diamond operators to read the content off STDIN or invocation via myscript.pl <some_File_To_process> （钻石运算符从STDIN读取内容或通过myscript.pl <some_File_To_process>调用

Answer 2

This is all that is necessary 这就是所有必要的

use strict;
use warnings;

my %map = (
    asmbl_1  => 'TCONS_00000046',
    asmbl_2  => 'TCONS_00000014',
    asmbl_16 => 'MELO3C000012',
);

my $re = join '|', map quotemeta, keys %map;

while ( <DATA> ) {
    s/\b($re)\b/$map{$1}/g;
    print;
}

__DATA__
CM3.6.1_CONTIG30890 assembler   transcript  187 1568    .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30890 assembler   exon    187 251 .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|asmbl_1";
CM3.6.1_CONTIG30898 assembler   exon    1339    2793    .   -   .   gene_id "PASA_cluster_2"; transcript_id "align_id:184318|asmbl_2";

output 输出

CM3.6.1_CONTIG30890 assembler   transcript  187 1568    .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30890 assembler   exon    187 251 .   -   .   gene_id "PASA_cluster_1"; transcript_id "align_id:184317|TCONS_00000046";
CM3.6.1_CONTIG30898 assembler   exon    1339    2793    .   -   .   gene_id "PASA_cluster_2"; transcript_id "align_id:184318|TCONS_00000014";

用具有匹配键的哈希值替换文件中的文本

问题描述

Hash: 杂凑：

File: 文件：

Desired output: 所需的输出：

Approaches: 方法：

2 个解决方案

解决方案1
3 2015-07-02 20:45:32

解决方案2
3 已采纳 2015-07-03 13:52:04

output 输出

用具有匹配键的哈希值替换文件中的文本

问题描述

Hash: 杂凑：

File: 文件：

Desired output: 所需的输出：

Approaches: 方法：

2 个解决方案

解决方案1 3 2015-07-02 20:45:32

解决方案2 3 已采纳 2015-07-03 13:52:04

output 输出

解决方案1
3 2015-07-02 20:45:32

解决方案2
3 已采纳 2015-07-03 13:52:04