简体   繁体   中英

Regular Expression Substitution with Conditional Replacement in Perl

My Perl skills are pretty rudimentary and I'm trying to convert dates in a data file loaded in a scalar variable to a four digit year using a regular expression substitution (among other things).

I've got the following to work to add a 20 to all years.

$data00 =~ s/^D(\d{2})\/(\d{2})\/(\d{2})\n/D$1\/$2\/20$3\n/gm;

However, the dates include those before 2000.

While searching for a solution I ran across the /e option which said that it evaluates the replacement as Perl code. However I don't find it listed in all the documentation I've run across and I'm not sure what the syntax would be.

Is there a way to evaluate the $3 match and output 20 if $3 is less than 50 to make 2000 and 19 if not, to make 1997? I selected 50 because it seemed to be a safe middle ground.

For illustration purposes though I know it's incorrect:

$data00 =~ s/^D(\d{2})\/(\d{2})\/(\d{2})\n/D$1\/$2\/(if($3<50)20 else 19)$3\n/eg;

Is the /e even appropriate in this case?

Line examples extracted from huge text file.

D04/07/97
D04/14/98
D10/06/99
D10/13/05
D03/04/10
D12/09/10
D01/20/11
D12/22/11

When using /e , the replacement expression must be a valid Perl expression (ie what you could put following $x = ).

You can use the conditional operator ( ?: ) to evaluate an expression differently based on a condition:

s/^D(\d{2})\/(\d{2})\/(\d{2})\n/ "D$1\/$2\/".( $3 < 50 ? 20 : 19 )."$3\n" /eg

Note that replacing the delimiter can make things far more readable when many / are involved.

s{^D(\d{2})/(\d{2})/(\d{2})\n}{ "D$1/$2/".( $3 < 50 ? 20 : 19 )."$3\n" }eg

I'd use Time::Piece to do this. Use the strptime() class method to parse the date into an object, and then strftime() to format it.

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';
use Time::Piece;

while (<DATA>) {
  chomp;

  my $date = Time::Piece->strptime($_, 'D%m/%d/%y');

  say $date->strftime('D%m/%d/%Y');
}

__DATA__
D04/07/97
D04/14/98
D10/06/99
D10/13/05
D03/04/10
D12/09/10
D01/20/11
D12/22/11

Output:

D04/07/1997
D04/14/1998
D10/06/1999
D10/13/2005
D03/04/2010
D12/09/2010
D01/20/2011
D12/22/2011

The regex solution can be simplified by a) choosing a different delimiter and b) using the ternary operator. If you use /e then the replacement text needs to be syntactically valid Perl.

while (<DATA>) {
  chomp;

  s|D(\d{2}/\d{2}/)(\d{2})|"D$1" . ($2 < 50 ? '20' : '19') . $2|e;

  say;
}

Update: There's one (possibly important) difference between the two solutions - the cut-off between the 20th and 21st centuries when converting from two-digit years to four-digit ones. The regex solution uses 50 (as mentioned in the original question). The Time::Piece solution uses 69 - and that limit is hard-coded, so there's no way of changing it. For the data in the original question, that makes no difference. But it might matter if you have data with a year between 1950 and 1969.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM