[英]how to count the number of specific characters through each line from file?
I'm trying to count the number of 'N's in a FASTA file which is: 我正在尝试计算FASTA文件中'N'的数量,它是:
>Header
AGGTTGGNNNTNNGNNTNGN
>Header2
AGNNNNNNNGNNGNNGNNGN
so in the end I want to get the count of number of 'N's and each header is a read so I want to make a histogram so I would at the end output something like this: 所以最后我想得到'N'的数量,每个标题是一个读数,所以我想做一个直方图,所以我会在最后输出这样的东西:
# of N's # of Reads
0 300
1 240
etc... 等等...
so there are 300 sequences or reads that have 0 number of 'N's 所以有300个序列或读数有0个'N'
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
my $line;
my $sequence;
my $length;
my $char_N_count = 0;
my @array;
my $count = 0;
if (!defined ($output_file)) {
die "USAGE: Input FASTA file\n";
}
open (IFH, "$file") or die "Cannot open input file$!\n";
open (OFH, ">$output_file") or die "Cannot open output file $!\n";
while($line = <IFH>) {
chomp $line;
next if $line =~ /^>/;
$sequence = $line;
@array = split ('', $sequence);
foreach my $element (@array) {
if ($element eq 'N') {
$char_N_count++;
}
}
print "$char_N_count\n";
}
Try this. 尝试这个。 I changed a few things like using scalar file handles.
我改变了一些像使用标量文件句柄的东西。 There are many ways to do this in Perl, so some people will have other ideas.
在Perl中有很多方法可以做到这一点,所以有些人会有其他的想法。 In this case I used an array which may have gaps in it - another option is to store results in a hash and key by the count.
在这种情况下,我使用了一个可能有间隙的数组 - 另一种选择是通过计数将结果存储在哈希和密钥中。
Edit: Just realised I'm not using $output_file, because I have no idea what you want to do with it :) Just change the 'print' at the end to 'print $out_fh' if your intent is to write to it. 编辑:刚刚意识到我没有使用$ output_file,因为我不知道你想用它做什么:)如果你打算写它,只需将最后的'print'更改为'print $ out_fh'。
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
if (!defined ($output_file)) {
die "USAGE: $0 <input_file> <output_file>\n";
}
open (my $in_fh, '<', $file) or die "Cannot open input file '$file': $!\n";
open (my $out_fh, '>', $output_file) or die "Cannot open output file '$output_file': $!\n";
my @results = ();
while (my $line = <$in_fh>) {
next if $line =~ /^>/;
my $num_n = ($line =~ tr/N//);
$results[$num_n]++;
}
print "# of N's\t# of Reads\n";
for (my $i = 0; $i < scalar(@results) ; $i++) {
unless (defined($results[$i])) {
$results[$i] = 0;
# another option is to 'next' if you don't want to show the zero totals
}
print "$i\t\t$results[$i]\n";
}
close($in_fh);
close($out_fh);
exit;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.