简体   繁体   中英

Read file and store in array in Perl?

I would like to read a file and to produce a numbers of arrays depending on how many chains (M,N,O,..) it would have.

Following is a part of a file:

SEQRES   1 M  312  ALA ALA ASP PRO LYS LEU LEU LYS ALA ALA ALA GLU ALA
SEQRES   2 M  312  SER TYR ALA PHE ALA LYS GLU VAL ASP TRP ASN ASN GLY
SEQRES   3 M  312  ILE PHE LEU GLN ALA PRO GLY LYS LEU GLN PRO LEU GLU
SEQRES   4 M  312  ALA LEU LYS ALA ILE ASP LYS MET ILE VAL MET GLY ALA
SEQRES   5 M  213  SER PHE ASN ARG ASN

SEQRES   1 N  312  ASP GLU ILE GLY ASP ALA ALA LYS LYS LEU GLY ASP ALA
SEQRES   2 N  312  SER TYR ALA PHE ALA LYS GLU VAL ASP TRP ASN ASN GLY
SEQRES   3 N  312  ILE PHE LEU GLN ALA PRO GLY LYS LEU GLN PRO LEU GLU
SEQRES   4 N  312  ALA LEU LYS ALA ILE ASP LYS MET ILE VAL MET GLY ALA
SEQRES   5 N  312  ALA ALA ASP PRO LYS LEU LEU LYS ALA ALA ALA GLU ALA
SEQRES   6 N  312  VAL THR SER ARG ALA ASP TRP ASP ASN VAL

SEQRES   1 O  312  HIS HIS LYS ALA ILE GLY SER ILE SER GLY PRO ASN GLY
SEQRES   2 O  312  SER TYR ALA PHE ALA LYS GLU VAL ASP TRP ASN ASN GLY
SEQRES   3 O  312  ILE PHE LEU GLN ALA PRO GLY LYS LEU GLN PRO LEU GLU
SEQRES   4 O  312  ALA LEU LYS ALA ILE ASP LYS MET ILE VAL

This is my code:

my @seq;
my $string="";
my @seqFile;
my $file=<>;
open(FILE, "$file");
while (my $line=<FILE>){
    if ($line =~ /^SEQRES/) {
        chomp $line;
        push @seq, [split (/\s+/, $line)] ;
    }
}
close(FILE);
for my $i (0..$#seq) {
    my $ob =$seq[$i][2];
    if ($seq[$i][2] eq $ob ){
        for (my $j=4;$j<=$#{$seq[$i]};$j++) {
            my $temp= $seq[$i][$j];
            $string .= $temp;
        }
        $ob = $seq[$i][2];
        last;
    }
    push @seqFile, $ob;
    push @seqFile, $string;
    $string = ''; #string needs to be empty to store new lines
}

With the above sample: 3 arrays M(:)ALAALAASP:...., N(:)ASPGLU.., O(:)HISHISLYS...

I managed to make all SEQRES in one string, but which is not what I wanted.

Somewhere I need to put a if(){} and to check M <=> N and N <=> O are different. Then save the string and start a string and array. But it keeps accumulating as many times of same string as $#seq. Or if I move the position of one } then it does not store anything, Either gives me error messages. How can I do this?

Do you not see a problem here?

my $ob =$seq[$i][2];
if ($seq[$i][2] ne $ob ){

This is analogous to:

my $x = "this";
if ($x ne "this) {

How could the if condition ever be true?

A better approach would be to use a hash of arrays, keyed on M, N, or O, (what you are setting $ob to):

open (my $fh, '<', $file);   # using global globs like FILE is depreciated
my %hash_of_arrays;
while (<$fh>) {
    my @data = split;
    push @{$hash_of_arrays{$data[2]}}, join('', (@data)[4..$#data]);
}

Pretty sure that is close to what you are trying to do; the 2nd arg to push uses an array slice .

Note that if @{$hash{$data[2]}} does not exist yet, it will be created via autovivification : http://en.wikipedia.org/wiki/Autovivification

I think this program does what you need.

Instead of watching for changes in the value of the third field I have written it so that a blank line or the end of the file marks the end of a chain.

use strict;
use warnings;

my $file = 'seq.txt';

open my $fh, '<', $file or die $!;

my @seqFile;
my $string;
my $ob;

while (<$fh>) {
  if (/^SEQRES/) {           
    my @data = split;
    $string .= join '', @data[4..$#data];
    $ob = $data[2];
  }
  if (eof($fh) or not /\S/) {
    push @seqFile, $ob, $string;
    $ob = $string = undef;
  }
}

use Data::Dumper;
print Dumper \@seqFile;

output

$VAR1 = [
          'M',
          'ALAALAASPPROLYSLEULEULYSALAALAALAGLUALASERTYRALAPHEALALYSGLUVALASPTRPASNASNGLYILEPHELEUGLNALAPROGLYLYSLEUGLNPROLEUGLUALALEULYSALAILEASPLYSMETILEVALMETGLYALASERPHEASNARGASN',
          'N',
          'ASPGLUILEGLYASPALAALALYSLYSLEUGLYASPALASERTYRALAPHEALALYSGLUVALASPTRPASNASNGLYILEPHELEUGLNALAPROGLYLYSLEUGLNPROLEUGLUALALEULYSALAILEASPLYSMETILEVALMETGLYALAALAALAASPPROLYSLEULEULYSALAALAALAGLUALAVALTHRSERARGALAASPTRPASPASNVAL',
          'O',
          'HISHISLYSALAILEGLYSERILESERGLYPROASNGLYSERTYRALAPHEALALYSGLUVALASPTRPASNASNGLYILEPHELEUGLNALAPROGLYLYSLEUGLNPROLEUGLUALALEULYSALAILEASPLYSMETILEVAL'
        ];

Edit

Now that I know the data file has no blank lines to delineate the chains, my original solution won't work.

This alternative checks the sequence number in the second field of the records, and starts a new chain when that number is 1. The accumulated chain must also be saved whenever a new chain starts and also at the end of the file after the read loop exits.

The output from this program is identical to that shown above.

use strict;
use warnings;

my $file = 'seq.txt';

open my $fh, '<', $file or die $!;

my @seqFile;
my $chain;
my $ob;

while (<$fh>) {

  next unless /^SEQRES/;

  my @data = split;
  if ($data[1] == 1) {
    push @seqFile, $ob, $chain if $chain;
    $ob = $chain = undef;
  }
  $chain .= join '', @data[4..$#data];
  $ob = $data[2];
}

push @seqFile, $ob, $chain if $chain;

use Data::Dumper;
print Dumper \@seqFile;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM