简体   繁体   中英

Delete the row based on order column using perl

I have a table separated by ',' I want to order and test if the value from num+1 exists in num column, or the value from num+2 exists in num, or the value from num+3 field exists in num, or the value from num+4 field exists in num for each row then delete the row if true.

my script is:

#!"C:\perl\bin\perl.exe"

use strict;
use warnings;

my $file_name = shift @ARGV;

die "Usage ./$1 <file_to_be_processed> > <output_file>" unless defined $file_name;

my $dic; # This is going to hold all values to be excluded.

open IN, "<", $file_name or die "Could not open $file_name $!\n";

while(<IN>) {
        chomp;
        @_ = split /,/;
        shift @_;
        map{$dic->{$_}++} @_;
}

close IN;

open IN, "<", $file_name or die "Could not open $file_name $!\n";

while(<IN>) {
        chomp;
        @_ = split /,/;
        print $_."\n" unless defined $dic->{$_[0]};
}

close IN;

there is my table:

num,num+1,num+2,num+3,num+4
1014,1015,1016,1017,1018
1015,1016,1017,1018,1019
1019,1020,1021,1022,1023
1025,1026,1027,1028,1029
1030,1031,1032,1033,1034

there is expected result:

num,num+1,num+2,num+3,num+4
1014,1015,1016,1017,1018
1019,1020,1021,1022,1023
1025,1026,1027,1028,1029
1030,1031,1032,1033,1034

My script works but excludes num 1019 from result, there is output of actual script:

num,num+1,num+2,num+3,num+4
1014,1015,1016,1017,1018
1025,1026,1027,1028,1029
1030,1031,1032,1033,1034

Looks like a misuse of map to me - if you're not using the result of map then you should probably be using a for loop instead.

But that aside - what you're doing here is creating a list of every symbol in column 1,2,3,4 and excluding a line if it exists?

Because your sample data includes 1019 on the previous line, that's why it's being excluded.

Your $dic looks like:

$VAR1 = {
          '1016' => 2,
          '1021' => 1,
          '1022' => 1,
          '1028' => 1,
          '1034' => 1,
          'num+3' => 1,
          '1017' => 2,
          'num+1' => 1,
          '1015' => 1,
          '1020' => 1,
          'num+2' => 1,
          '1023' => 1,
          '1026' => 1,
          '1019' => 1,
          '1031' => 1,
          '1027' => 1,
          '1032' => 1,
          '1033' => 1,
          '1018' => 2,
          '1029' => 1,
          'num+4' => 1
        };

As 1019 is in it, the 1019 line gets skipped.

Also:

  • Data::Dumper is useful for seeing what's in a data structure.
  • don't use map like that. Use a for loop.

Something like:

while(<IN>) {
    chomp;
    $dic->{$_}++ for split /,/;
}
  • don't use @_ like that - it's a special variable, with a specific meaning. Call it something else.

  • current good practice is to use a lexical filehandle with open . eg open ( my $in, '<', $filename ) or die $!; because of keeping the scope down.

  • if you just want to check the first column, you can assign like this: my ( $col, @rest ) = split /,/; and just test $col . Or you can skip the chomp entirely and just do:

     print unless defined $dic->{(split /,/)[0]}; 

You have to dynamically change your hash when you skip a line:

#!/usr/bin/perl
use warnings;
use strict;

my %dic;
my $pos = tell DATA; # Remember where the data start.

while (<DATA>) {
    chomp;
    my @ar = split /,/;
    # Fix SO syntax highlighting error: /
    shift @ar;
    $dic{$_}++ for @ar;
}

seek DATA, $pos, 0; # Back to the data start.

while (<DATA>) {
    chomp;
    my @ar = split /,/;
    if ($dic{ $ar[0] }) {
        delete $dic{ $_ } for @ar[1 .. $#ar]; # <-- this was missing!
    } else {
        print "$_\n";
    }
}

__DATA__
1014,1015,1016,1017,1018
1015,1016,1017,1018,1019
1019,1020,1021,1022,1023
1025,1026,1027,1028,1029
1030,1031,1032,1033,1034

1019 is being skipped because even though the 1015 row will be skipped in the second loop, the 1019 key was already defined for $dic in your first loop.

map{$dic->{$_}++} @_;

On the second iteration of the first loop (the 1015 row), that line is setting keys 1016, 1017, 1018, and 1019 (to 1). Then in your second loop:

print $_."\n" unless defined $dic->{$_[0]};

your unless skips 1015, but doesn't do anything to remove the keys that the 1015 row defined with it, so it continues to remove the 1019 row.

If I understand you correctly then this is all you need

use strict;
use warnings;

my %seen;

while ( <DATA> ) {
    chomp;
    my @fields = split /,/;
    if ( not $seen{ shift @fields } ) {
        $seen{$_} = 1 for @fields;
        print "$_\n";
    }
}

__DATA__
1014,1015,1016,1017,1018
1015,1016,1017,1018,1019
1019,1020,1021,1022,1023
1025,1026,1027,1028,1029
1030,1031,1032,1033,1034

output

1014,1015,1016,1017,1018
1019,1020,1021,1022,1023
1025,1026,1027,1028,1029
1030,1031,1032,1033,1034

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM