[英]Want to add random string to identifier line in fasta file
I want to add random string to existing identifier line in fasta file. 我想向fasta文件中的现有标识符行添加随机字符串。 So I get: 所以我得到:
MMETSP0259|AmphidiniumcarteCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Then the sequence on the next lines as normal. 然后在下一行中的顺序正常。 I am have problem with i think in the format output. 我在格式输出中有问题。 This is what I get: 这是我得到的:
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACTaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
TCTGGGAAAGGTTGCTATCATGAGTCATAGAATaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
It's added to every line. 它已添加到每一行。 (I altered length to fit here.) I want just to add to the identifier line. (我更改了长度以适合此处。)我只想添加到标识符行。
This is what i have so far: 这是我到目前为止所拥有的:
use strict;
use warnings;
my $currentId = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
my $header_line;
my $seq;
my $uniqueID;
open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n";
open (my $out_fh, ">$ARGV[0]_longer_ID_MMETSP.fasta");
while( <$fh> ){
if ($_ =~ m/^(\S+)\s+(.*)/) {
$header_line = $1;
$seq = $2;
$uniqueID = $currentId++;
print $out_fh "$header_line$uniqueID\n$seq";
} # if
} # while
close $fh;
close $out_fh;
Thanks very much, any ideas will be greatly appreciated. 非常感谢,任何想法将不胜感激。
Your program isn't working because the regex ^(\\S+)\\s+(.*)
matches every line in the input file. 您的程序无法运行,因为正则表达式^(\\S+)\\s+(.*)
匹配输入文件中的每一行。 For instance, \\S+
matches CTTCATCGCACATGGATAACTGTGTACCTGACT
; 例如, \\S+
匹配CTTCATCGCACATGGATAACTGTGTACCTGACT
; the newline at the end of the line matches \\s+
; 该行末尾的换行符与\\s+
匹配; and nothing matches .*
. 没有匹配的.*
。
Here's how I would encode your solution. 这是我如何编码您的解决方案。 It simply appends $current_id
to the end of any line that contains a pipe |
它只是追加$current_id
到包含管道的任何一行的末尾|
character 字符
use strict;
use warnings;
use 5.010;
use autodie;
my ($filename) = @ARGV;
my $current_id = 'a' x 57;
open my $in_fh, '<', $filename;
open my $out_fh, '>', "${filename}_longer_ID_MMETSP.fasta";
while ( my $line = <$in_fh> ) {
chomp $line;
$line .= $current_id if $line =~ tr/|//;
print $line, "\n";
}
close $out_fh;
output 产量
MMETSP0259|AmphidiniumCMP1314aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
CTTCATCGCACATGGATAACTGTGTACCTGACT
TCTGGGAAAGGTTGCTATCATGAGTCATAGAAT
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.