简体   繁体   中英

Distinguishing and substituting decimals in Perl

I want to substitute decimals from commas to fullstops in a file and I wanted to try to do this in perl. An example of my dataset looks something like this:

Species_1:0,12, Species_2:0,23, Species_3:2,53

I want to substitute the decimals but not all commas such that:

Species_1:0.12, Species_2:0.23, Species_3:2.53

I was thinking it might work using the substitution function like such:

$comma_file= "Species_1:0,12 , Species_2:0,23, Species_3:2,53"

    $comma = "(:\d+/,\d)";
#match a colon, any digits after the colon, the wanted comma and digits preceding it
       if ($comma_file =~ m/$comma/g) {
           $comma_file =~ tr/,/./;
        }
print "$comma_file\n"; 

However, when I tried this, what happened was that all my commas changed into fullstops, not just the ones I was targetting. Is it an issue with the regex or am I just not doing the match substitution correctly?

Thanks!

This :

use strict;
use warnings;
my $comma_file = "Species_1:0,12, Species_2:0,23, Species_3:2,53";
$comma_file =~ s/(\d+),(\d+)/$1.$2/g;
print $comma_file, "\n";

Yields :

Species_1:0.12, Species_2:0.23, Species_3:2.53

The regex searches for commas having at least one digit on both sides and replaces them with a dot.

Your code doesn't work because you first check for commas surrounded by digits, and, if ok, you then replace ALL commas with dots

From the shown data it appears that a comma to be replaced must always have a number on each side, and that every such occurrence need be replaced. There is a fine answer by GMB .

Another way for this kind of a problem is to use lookarounds

$comma_file =~ s/(?<=[0-9]),(?=[0-9])/./g;

which should be more efficient, as there is no copying into $1 and $2 and no quantifiers.

My benchmark

use warnings;
use strict;
use feature 'say';

use Benchmark qw(cmpthese);

my $str = q(Species_1:0,12, Species_2:0,23, Species_3:2,53);

sub subs {
    my ($str) = @_; 
    $str =~ s/(\d+),(\d+)/$1.$2/g;
    return $str;
}

sub look {
    my ($str) = @_; 
    $str =~ s/(?<=\d),(?=\d)/./g;
    return $str;
}

die "Output not equal" if subs($str) ne look($str);

cmpthese(-3, {
    subs => sub { my $res = subs($str) },
    look => sub { my $res = look($str) },
});

with output

Rate subs look
subs 256126/s   -- -46%
look 472677/s  85%   --

This is only one, particular, string but the efficiency advantage should only increase with the length of the string, while longer patterns (numbers here) should reduce that a little.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM