I am facing little issue in taking array index for comparison and displaying the result. I have a tab delimited file with 9 columns and more than 100 rows. I want to compare the 8th column element of ith row with the 7th column element of i+1th row. If it is smaller than the 7th column element then print entire row else if it is greater than the 7th column element the compare the 6th element of both row and only print if the row if the 6th element is bigger.
Sample File
Recep_L_domain PF01030.22 112 sp|P00533|EGFR_HUMAN 2.50E-30 104.7 57 167 Receptor
Furin-like PF00757.18 149 sp|P00533|EGFR_HUMAN 4.10E-29 101.3 185 338 Furin-like
Recep_L_domain PF01030.22 112 sp|P00533|EGFR_HUMAN 3.60E-28 97.8 361 480 Receptor
GF_recep_IV PF14843.4 132 sp|P00533|EGFR_HUMAN 1.60E-46 157.2 505 636 Growth
Pkinase PF00069.23 264 sp|P00533|EGFR_HUMAN 2.70E-39 135 712 964 Protein
Pkinase_Tyr PF07714.15 260 sp|P00533|EGFR_HUMAN 8.40E-88 293.9 714 965 Protein
For example if we compare the last two row then 8th column element is bigger than the next row's 7th column element, then in this case it should compare the two 6th column element and print the only row which is bigger. So from this two row it should print only last row. For me the below code is only printing the values if it is smaller, but I want to ask how can I compare 6th element and print results if 8th column is bigger?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
open(IN,"<samplecode.txt");
my @Alifrom;
my @Alito;
my @data; ## multidimensional array
while(<IN>){
chomp $_;
#next if $_=undef;
my @line = split("\t", $_);
##my($a, $b, $c, $d, $e, $f, $g, $h, $i) = split(/\t/,$_); // catch data and storing into multiple scalar variable
push @data, [@line];
}
for (my $i = 0; $i < @data ; $i++){
if ($data[$i][7] gt $data[$i][6]){
for (my $j = 0; $j < @{$data[$i]}; $j++){
#@Alifrom = map $data[$i][$j+6], @data;
print "$data[$i][$j]\t";
}
}
#else
print "\n";
}
The description in your question is not entirely clear, but I'm taking an educated guess.
First, you should not read the whole file into an array. If your file really only has 100 rows, it's not a problem, but if there are more rows this will consume a lot of memory.
You say you want to compare values in every row i to values in row i+1 , so essentially in every row you want to look at values in the next row. That means you need to keep a maximum of two rows in memory at one time. Since that's linear, you can just read the first row, then read the second row, compare, and when you're done make the second row the new first row.
In your loop, you always read the second row, and keep around the first row from when you read it as the second row in the iteration before.
For that, it makes sense to turn the reading and splitting into a function. You can pass it a file handle. In my example above, I've used DATA
with the __DATA__
section, but you can just open my $fh, '<', 'samplecode.txt'
and pass $fh
around.
Because you want to print the whole row in some cases, you should not just chomp
and split
it in a destructive manner, but rather keep around the actual full row including the line break. We therefore make the function to read and split return two values: the full row as a scalar string, and an array reference of the cleaned up columns.
If there are no more lines to read, we return an implicit undef
, which will make the while
loop stop. Therefore you can never process the last row of the file.
When comparing, note that list indexes in Perl always start on zero, so column 7 is index [6]
.
Here's an example implementation.
use strict;
use warnings;
# this function reads a line from the filehandle that's passed in and returns
# the row as a string and an array ref of all columns, or undef if there are
# no more lines to read
sub read_and_split {
my $fh = shift;
# read one line and return undef if there is no more data
my $row = <$fh>;
return unless defined $row;
# split into columns
my @cols = split /\s+/, $row; # Stack Overflow does not like tabs, use \t
# only chomp after splitting so we retain the original line for printing
chomp $cols[-1];
# return both things
return $row, \@cols;
}
# read the first line
my ( $row_i, $cols_i ) = read_and_split( \*DATA );
# read subsequent lines
while ( my ( $row_i_plus_one, $cols_i_plus_one ) = read_and_split( \*DATA ) ) {
# 7th col of i is smaller than 6th col of i+1
if ( $cols_i->[7] < $cols_i_plus_one->[6] ) {
print $row_i;
}
else {
# compare the 6th element of both row and only print
# if the row if the 6th element is bigger
if ( $cols_i->[5] > $cols_i_plus_one->[5] ) {
print $row_i;
}
}
# turn the current i+1 into i for the next iteration
$row_i = $row_i_plus_one;
$cols_i = $cols_i_plus_one;
}
__DATA__
Recep_L_domain PF01030.22 112 sp|P00533|EGFR_HUMAN 2.50E-30 104.7 57 167 Receptor
Furin-like PF00757.18 149 sp|P00533|EGFR_HUMAN 4.10E-29 101.3 185 338 Furin-like
Recep_L_domain PF01030.22 112 sp|P00533|EGFR_HUMAN 3.60E-28 97.8 361 480 Receptor
GF_recep_IV PF14843.4 132 sp|P00533|EGFR_HUMAN 1.60E-46 157.2 505 636 Growth
Pkinase PF00069.23 264 sp|P00533|EGFR_HUMAN 2.70E-39 135 712 964 Protein
Pkinase_Tyr PF07714.15 260 sp|P00533|EGFR_HUMAN 8.40E-88 293.9 714 965 Protein
It outputs these lines:
Recep_L_domain PF01030.22 112 sp|P00533|EGFR_HUMAN 2.50E-30 104.7 57 167 Receptor
Furin-like PF00757.18 149 sp|P00533|EGFR_HUMAN 4.10E-29 101.3 185 338 Furin-like
Recep_L_domain PF01030.22 112 sp|P00533|EGFR_HUMAN 3.60E-28 97.8 361 480 Receptor
GF_recep_IV PF14843.4 132 sp|P00533|EGFR_HUMAN 1.60E-46 157.2 505 636 Growth
Note that the part about comparing columns six was not very clear in your question. I assumed we compare both columns six and print the one for row i if it's a match. If we were to print row i+1 we might end up printing that line twice.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.