简体   繁体   中英

Perl remove characters from string matrix according to pattern

I have multiple strings with the same length stored in a hash data structure. Example:

$VAR1 = {
          'first' => 'abcXY',
          'second' => 'XYXYa',
          'third' => '*abXZ'
        };

From this 'matrix' of characters, I would like to remove the 'columns' which exclusively contain the characters X or Y . In the above example, this would be the fourth character of each string (4th 'column'). The desired result would be:

$VAR1 = {
          'first' => 'abcY',
          'second' => 'XYXa',
          'third' => '*abZ'
        };

Following code does this by creating a transpose of the values of my hash structure and then determines which indices to keep:

# data structure
my %h = ('first'=>'abcXY', 'second'=>'XYXYa', 'third'=>'*abXZ' );

# get length of all values in hash
my $nchar = length $h{(keys(%h))[0]};

# transpose values of hash
my @transposed = map { my $idx=$_; [map {substr ($_, $idx, 1) } values(%h)]  } 0..$nchar-1; 

# determine indices which I want to keep
my @indices;
for my $i (0..$#transposed){
        my @a = @{$transposed[$i]};

        # do not keep index if column consists of X and Y
        if ( scalar(grep {/X|Y/} @a) < scalar(@a) ) {
                push @indices, $i;
        }   
}

# only keep letters with indices
for my $k (keys %h){
        my $str = $h{$k};
        my $reduced = join "", map{ substr ($str, $_, 1) } @indices;
        $h{$k} = $reduced;
}

This is a terrible amount of code for such a simple operation. How could I do this more elegantly (preferably not with some matrix library, but with standard perl)?

Edit

Here another example: From the following strings, the first and last characters should be removed, because in both strings, the first and last position is either X or Y :

$VAR1 = {
          '1' => 'Xsome_strX',
          '2' => 'YsomeXstrY'
        };

Desired result:

$VAR1 = {
          '1' => 'some_str',
          '2' => 'someXstr'
        };
my $total = values %hash;
my %ind;
for my $v (values %hash) {

  $ind{ pos($v) -1 }++ while $v =~ /[XY]/g;
}
my @to_remove = sort {$b <=> $a} grep { $ind{$_} == $total } keys %ind;

for my $v (values %hash) {

  substr($v, $_, 1, "") for @to_remove;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM