简体   繁体   English

想要在fasta文件中的标识符行中添加随机字符串

[英]Want to add random string to identifier line in fasta file

I want to add random string to existing identifier line in fasta file. 我想向fasta文件中的现有标识符行添加随机字符串。 So I get: 所以我得到:

MMETSP0259|AmphidiniumcarteCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Then the sequence on the next lines as normal. 然后在下一行中的顺序正常。 I am have problem with i think in the format output. 我在格式输出中有问题。 This is what I get: 这是我得到的:

MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
TCTGGGAAAGGTTGCTATCATGAGTCATAGAATaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac

It's added to every line. 它已添加到每一行。 (I altered length to fit here.) I want just to add to the identifier line. (我更改了长度以适合此处。)我只想添加到标识符行。

This is what i have so far: 这是我到目前为止所拥有的:

use strict;
use warnings;
my $currentId = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";

my $header_line;
my $seq;
my $uniqueID;

open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">$ARGV[0]_longer_ID_MMETSP.fasta");

while( <$fh> ){
    if ($_ =~ m/^(\S+)\s+(.*)/) {
        $header_line = $1;
        $seq = $2;
        $uniqueID = $currentId++;
        print $out_fh "$header_line$uniqueID\n$seq";
    } # if
} # while

close $fh;
close $out_fh;

Thanks very much, any ideas will be greatly appreciated. 非常感谢,任何想法将不胜感激。

Your program isn't working because the regex ^(\\S+)\\s+(.*) matches every line in the input file. 您的程序无法运行,因为正则表达式^(\\S+)\\s+(.*)匹配输入文件中的每一行。 For instance, \\S+ matches CTTCATCGCACATGGATAACTGTGTACCTGACT ; 例如, \\S+匹配CTTCATCGCACATGGATAACTGTGTACCTGACT the newline at the end of the line matches \\s+ ; 该行末尾的换行符与\\s+匹配; and nothing matches .* . 没有匹配的.*

Here's how I would encode your solution. 这是我如何编码您的解决方案。 It simply appends $current_id to the end of any line that contains a pipe | 它只是追加$current_id到包含管道的任何一行的末尾| character 字符

use strict;
use warnings;
use 5.010;
use autodie;

my ($filename) = @ARGV;

my $current_id = 'a' x 57;

open my $in_fh,  '<', $filename;
open my $out_fh, '>', "${filename}_longer_ID_MMETSP.fasta";

while ( my $line = <$in_fh> ) {
    chomp $line;
    $line .= $current_id if $line =~ tr/|//;
    print $line, "\n";
}

close $out_fh;

output 产量

MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACT
TCTGGGAAAGGTTGCTATCATGAGTCATAGAAT

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM