简体   繁体   中英

Manipulating backreferences for substitution in perl

As a part of an attempt to replace scientific numbers with decimal numbers I want to save a backreference into a string variable, but it doesn't work.

My input file is:

,8E-6,
,-11.78E-16,
,-17e+7,

I then run the following:

open FILE, "+<C:/Perl/input.txt" or die $!;
open(OUTPUT, "+>C:/Perl/output.txt") or die;

while (my $lines = <FILE>){

  $find = "(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
  $noofzeroesbeforecomma = eval("$7-length($4)");
  $replace = '"foo $noofzeroesbeforecomma bar"';

  $lines =~ s/$find/$replace/eeg;
  print (OUTPUT $lines);
}

close(FILE);

I get

foo  bar
foo  bar
foo  bar

where I would have expected

foo 6 bar
foo 14 bar
foo 7 bar

$noofzeroesbeforecomma seems to be empty or non-existant.

Even with the following adjustment I get an empty result

$noofzeroesbeforecomma = $2;

Only inserting $2 directly in the replace string gives me something (which is then, unfortunately, not what I want).

Can anyone help?

I'm running Strawberry Perl (5.16.1.1-64bit) on a 64-bit Windows 7 machine, and quite inexperienced with Perl

Your main problem is not using

use strict;
use warnings;

warnings would have told you

Use of uninitialized value $7 in concatenation (.) or string at ...
Use of uninitialized value $4 in concatenation (.) or string at ...

I would recommend you try and find a module that can handle scientific notation, rather than trying to hack your own.

Your code, in a working order might look something like this. As you can see, I have put a q() around your eval string to avoid it being evaluated before $7 and $4 exists. I also removed the eval itself, since while double eval on an eval is somewhat excessive.

use strict;
use warnings;

while (my $lines = <DATA>) {
    my $find="(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
    my $noof = q|$7-length($4)|;
    $lines =~ s/$find/$noof/eeg;
    print $lines;
}


__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,

Output:

6
14
7

As a side note, not using strict is asking for trouble. Doing it while using a variable name such as $noofzeroesbeforecomma is asking for twice the trouble, as it is rather easy to make typos.

This is not about backreferences but the original problem, transforming numbers from scientific notation. I'm sure there are some cases in which this fails:

#!/usr/bin/env perl

use strict;
use warnings;
use bignum;

for (<DATA>) {
    next unless /([+-]?\d+(?:\.\d+)?)[Ee]([+-]\d+)/;
    print $1 * 10 ** $2 . "\n";
}

__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,

Output:

0.000008
-0.000000000000001178
-170000000

I suggest you use the Regexp::Common::number plugin for the Regexp::Common module which will find all real numbers for you and allow you to replace those that have an exponent marker

This code shows the idea. using the -keep option makes the module put each component into one of the $N variables. The exponent marker - e or E - is in $7 , so the number can be transformed depending on whether this was present

use strict;
use warnings;

use Regexp::Common;

my $real_re = $RE{num}{real}{-keep};

while (<>) {
  s/$real_re/ $7 ? sprintf '%.20f', $1 : $1 /eg;
  print;
}

output

Given your example input, this code produces the following. The values can be tidied up further using additional code in the substitution

,0.00000800000000000000,
,-0.00000000000000117800,
,-170000000.00000000000000000000,

The problem is that Perl can handle all those types of expressions. And since the standard item of data in Perl is the string, you would only need to capture the expression to use it. So, take this expression:

/(-?\d+(?:.\d+)?[Ee][+-]?\d+)/

to extract it from the surrounding text and use sprintf to format it, like Borodin showed.

However , if it helps you to see a better case of what you tried to do, this works better

my ( $whole, $frac, $expon )
    = $line =~ m/(?:,)-?(0|[1-9]\d*)(?:\.(\d*))?[eE]([+\-]?\d+)(?:,)/
    ;
my $num = $expon - length( $frac );
  • Why not capture the sign with the exponent anyway , if you're going to do arithmetic with it?

  • It's better to name your captures and eschew eval when it's not necessary.

  • The substitution--as is--doesn't make much sense.

  • Really, since neither the symbols or the digits can be case sensitive, just put a (?i) at the beginning, and avoid the E "character class" [Ee] :

     /((?i)-?\\d+(?:.\\d+)?e[+-]?\\d+)/ 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM