简体   繁体   中英

in perl, why does push cause a regex created by qr to be changed when not put into double quotes?

I am defining regexes inside a script, using qr , and pushing them onto an array. But now it appears that if I do not put the regex inside double quotes, the action of pushing it onto the array changes it. Example:

#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper qw(Dumper);
use Data::Dumper::Concise;
my @regexes;
my $rgx = 'dog'; my $mdf = 'i';
$rgx = join ( '', '(?', $mdf, ')', $rgx ) if ($mdf); # in production, $mdf could be empty
eval { $rgx = qr/$rgx/ };
if ($@) # catch illegitimate regex modifier, such as 'g'
{
   die "rgx==$rgx; mdf==$mdf; qr throws an error";
}
push @regexes, $rgx;
push @regexes, "$rgx";
print "first try just printing \$rgx\n";
print " no double quotes:";
print $rgx; print "\n";
print "yes double quotes:";
print "$rgx"; print "\n";
print "but now see what happens when I push it onto an array\n";
print Dumper \@regexes;

What this produces:

first try just printing $rgx
 no double quotes:(?^:(?i)dog)
yes double quotes:(?^:(?i)dog)
but now see what happens when I push it onto an array
[
  qr/(?i)dog/i,
  "(?^:(?i)dog)"
]

I thought that (?^:(?i)dog) was a finished product, ready for a regex match, such as

if ( /$rgx/ )

and, in fact, that is why I run the prospective regex through qr .

Why does push change it?

And why does it produce the particular syntax, qr/(?i)dog/i ?

You are effectively asking the difference between the values returned by

my $rgx = qr/$rgx/; $rgx

and

my $rgx = qr/$rgx/; "$rgx"

qr// compiles the provided regex pattern and returns an object representing the compiled form. This is the value stored in the variable $rgx , and this is returned by the expression $rgx .

"" builds a string, so "$rgx" provides the stringification of $rgx . This thankfully returns a string that can be used as a regex pattern represented by the compiled object. However, by doing "$rgx" , you are effectively undoing the work done by qr/$rgx/ .

Data::Dumper represents regex objects using qr// literals and strings using the "" literals.

It's the same pattern in different representations. Data::Dumper makes a particular string representation of an object, and the regex object itself creates a different representation when you interpolate it.

Perhaps my article from The Effective Perl can help: Let perl create your regex stringification

Since I am answering my own question, I can be slightly disrespectful toward the poster (myself). I asked two questions.

The answer to the first is: "Un-ask the question. push does not change the regex."

The answer to the second is: Again, push does not produce that particular syntax. The Data::Dumper package is what produces the puzzling syntax. The following code demonstrates this.

#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper qw(Dumper); use Data::Dumper::Concise;
my $rgx = 'dog';
my $mdf = 'is';
$rgx = join ( '', '(?', $mdf, ')', $rgx ) if ($mdf); # in production, $mdf could be empty
print "   no quotes rgx=="; print $rgx; print ";\n"; print "      quotes rgx=="; print "$rgx"; print ";\n";
print "DD no quotes rgx=="; print Dumper $rgx; print "DD    quotes rgx=="; print Dumper "$rgx"; 
$rgx = qr/$rgx/;
print "\nNow, after qr:\n";
print "   no quotes rgx=="; print $rgx; print ";\n"; print "      quotes rgx=="; print "$rgx"; print ";\n";
print "DD no quotes rgx=="; print Dumper $rgx; print "DD    quotes rgx=="; print Dumper "$rgx"; 

and what it prints:

   no quotes rgx==(?is)dog;
      quotes rgx==(?is)dog;
DD no quotes rgx=="(?is)dog"
DD    quotes rgx=="(?is)dog"

Now, after qr:
   no quotes rgx==(?^:(?is)dog);
      quotes rgx==(?^:(?is)dog);
DD no quotes rgx==qr/(?is)dog/si
DD    quotes rgx=="(?^:(?is)dog)"

It has been stated that qr "compiles" a regex. Because of my past experience as a student writing code in compiled languages (FORTRAN, Pascal), I think I misunderstood that term. From https://perldoc.perl.org/perldata#Scalar-values ,

Scalars aren't necessarily one thing or another. There's no place to declare a scalar variable to be of type "string", type "number", type "reference", or anything else. Because of the automatic conversion of scalars, operations that return scalars don't need to care (and in fact, cannot care) whether their caller is looking for a string, a number, or a reference. Perl is a contextually polymorphic language whose scalars can be strings, numbers, or references (which includes objects).

If I read this correctly, the output of qr will not be "binary" or something analogous to Pascal object code. It is exactly what print shows it to be in the sample code.

So I was way off-base by thinking that push was the guilty party. It appears that Dumper automatically converts certain scalars, if they happen to be interpretable as regexes, by putting them inside forward slashes. And also Dumper apparently copies the regex flag from inside the parentheses in the regex, reorders it (note it changed the double flag from is to si ), and puts it after the 2nd forward slash.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM