简体   繁体   中英

Want to add random string to identifier line in fasta file

I want to add random string to existing identifier line in fasta file. So I get:

MMETSP0259|AmphidiniumcarteCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Then the sequence on the next lines as normal. I am have problem with i think in the format output. This is what I get:

MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
TCTGGGAAAGGTTGCTATCATGAGTCATAGAATaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac

It's added to every line. (I altered length to fit here.) I want just to add to the identifier line.

This is what i have so far:

use strict;
use warnings;
my $currentId = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";

my $header_line;
my $seq;
my $uniqueID;

open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">$ARGV[0]_longer_ID_MMETSP.fasta");

while( <$fh> ){
    if ($_ =~ m/^(\S+)\s+(.*)/) {
        $header_line = $1;
        $seq = $2;
        $uniqueID = $currentId++;
        print $out_fh "$header_line$uniqueID\n$seq";
    } # if
} # while

close $fh;
close $out_fh;

Thanks very much, any ideas will be greatly appreciated.

Your program isn't working because the regex ^(\\S+)\\s+(.*) matches every line in the input file. For instance, \\S+ matches CTTCATCGCACATGGATAACTGTGTACCTGACT ; the newline at the end of the line matches \\s+ ; and nothing matches .* .

Here's how I would encode your solution. It simply appends $current_id to the end of any line that contains a pipe | character

use strict;
use warnings;
use 5.010;
use autodie;

my ($filename) = @ARGV;

my $current_id = 'a' x 57;

open my $in_fh,  '<', $filename;
open my $out_fh, '>', "${filename}_longer_ID_MMETSP.fasta";

while ( my $line = <$in_fh> ) {
    chomp $line;
    $line .= $current_id if $line =~ tr/|//;
    print $line, "\n";
}

close $out_fh;

output

MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACT
TCTGGGAAAGGTTGCTATCATGAGTCATAGAAT

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM