简体   繁体   中英

Hidden features of Perl?

What are some really useful but esoteric language features in Perl that you've actually been able to employ to do useful work?

Guidelines:

  • Try to limit answers to the Perl core and not CPAN
  • Please give an example and a short description

Hidden Features also found in other languages' Hidden Features:

(These are all from Corion's answer )

  • C
    • Duff's Device
    • Portability and Standardness
  • C#
    • Quotes for whitespace delimited lists and strings
    • Aliasable namespaces
  • Java
    • Static Initalizers
  • JavaScript
    • Functions are First Class citizens
    • Block scope and closure
    • Calling methods and accessors indirectly through a variable
  • Ruby
    • Defining methods through code
  • PHP
    • Pervasive online documentation
    • Magic methods
    • Symbolic references
  • Python
    • One line value swapping
    • Ability to replace even core functions with your own functionality

Other Hidden Features:

Operators:

Quoting constructs:

Syntax and Names:

Modules, Pragmas, and command-line options:

Variables:

Loops and flow control:

Regular expressions:

Other features:

Other tricks, and meta-answers:


See Also:

The flip-flop operator is useful for skipping the first iteration when looping through the records (usually lines) returned by a file handle, without using a flag variable:

while(<$fh>)
{
  next if 1..1; # skip first record
  ...
}

Run perldoc perlop and search for "flip-flop" for more information and examples.

There are many non-obvious features in Perl.

For example, did you know that there can be a space after a sigil?

 $ perl -wle 'my $x = 3; print $ x'
 3

Or that you can give subs numeric names if you use symbolic references?

$ perl -lwe '*4 = sub { print "yes" }; 4->()' 
yes

There's also the "bool" quasi operator, that return 1 for true expressions and the empty string for false:

$ perl -wle 'print !!4'
1
$ perl -wle 'print !!"0 but true"'
1
$ perl -wle 'print !!0'
(empty line)

Other interesting stuff: with use overload you can overload string literals and numbers (and for example make them BigInts or whatever).

Many of these things are actually documented somewhere, or follow logically from the documented features, but nonetheless some are not very well known.

Update : Another nice one. Below the q{...} quoting constructs were mentioned, but did you know that you can use letters as delimiters?

$ perl -Mstrict  -wle 'print q bJet another perl hacker.b'
Jet another perl hacker.

Likewise you can write regular expressions:

m xabcx
# same as m/abc/

Add support for compressed files via magic ARGV :

s{ 
    ^            # make sure to get whole filename
    ( 
      [^'] +     # at least one non-quote
      \.         # extension dot
      (?:        # now either suffix
          gz
        | Z 
       )
    )
    \z           # through the end
}{gzcat '$1' |}xs for @ARGV;

(quotes around $_ necessary to handle filenames with shell metacharacters in)

Now the <> feature will decompress any @ARGV files that end with ".gz" or ".Z":

while (<>) {
    print;
}

One of my favourite features in Perl is using the boolean || operator to select between a set of choices.

 $x = $a || $b;

 # $x = $a, if $a is true.
 # $x = $b, otherwise

This means one can write:

 $x = $a || $b || $c || 0;

to take the first true value from $a , $b , and $c , or a default of 0 otherwise.

In Perl 5.10, there's also the // operator, which returns the left hand side if it's defined, and the right hand side otherwise. The following selects the first defined value from $a , $b , $c , or 0 otherwise:

$x = $a // $b // $c // 0;

These can also be used with their short-hand forms, which are very useful for providing defaults:

$x ||= 0;   # If $x was false, it now has a value of 0.

$x //= 0;   # If $x was undefined, it now has a value of zero.

Cheerio,

Paul

The operators ++ and unary - don't only work on numbers, but also on strings.

my $_ = "a"
print -$_

prints -a

print ++$_

prints b

$_ = 'z'
print ++$_

prints aa

As Perl has almost all "esoteric" parts from the other lists, I'll tell you the one thing that Perl can't:

The one thing Perl can't do is have bare arbitrary URLs in your code, because the // operator is used for regular expressions.

Just in case it wasn't obvious to you what features Perl offers, here's a selective list of the maybe not totally obvious entries:

Duff's Device - in Perl

Portability and Standardness - There are likely more computers with Perl than with a C compiler

A file/path manipulation class - File::Find works on even more operating systems than .Net does

Quotes for whitespace delimited lists and strings - Perl allows you to choose almost arbitrary quotes for your list and string delimiters

Aliasable namespaces - Perl has these through glob assignments:

*My::Namespace:: = \%Your::Namespace

Static initializers - Perl can run code in almost every phase of compilation and object instantiation, from BEGIN (code parse) to CHECK (after code parse) to import (at module import) to new (object instantiation) to DESTROY (object destruction) to END (program exit)

Functions are First Class citizens - just like in Perl

Block scope and closure - Perl has both

Calling methods and accessors indirectly through a variable - Perl does that too:

my $method = 'foo';
my $obj = My::Class->new();
$obj->$method( 'baz' ); # calls $obj->foo( 'baz' )

Defining methods through code - Perl allows that too :

*foo = sub { print "Hello world" };

Pervasive online documentation - Perl documentation is online and likely on your system too

Magic methods that get called whenever you call a "nonexisting" function - Perl implements that in the AUTOLOAD function

Symbolic references - you are well advised to stay away from these. They will eat your children. But of course, Perl allows you to offer your children to blood-thirsty demons.

One line value swapping - Perl allows list assignment

Ability to replace even core functions with your own functionality

use subs 'unlink'; 
sub unlink { print 'No.' }

or

BEGIN{
    *CORE::GLOBAL::unlink = sub {print 'no'}
};

unlink($_) for @ARGV

Autovivification . AFAIK no other language has it .

It's simple to quote almost any kind of strange string in Perl.

my $url = q{http://my.url.com/any/arbitrary/path/in/the/url.html};

In fact, the various quoting mechanisms in Perl are quite interesting. The Perl regex-like quoting mechanisms allow you to quote anything, specifying the delimiters. You can use almost any special character like #, /, or open/close characters like (), [], or {}. Examples:

my $var  = q#some string where the pound is the final escape.#;
my $var2 = q{A more pleasant way of escaping.};
my $var3 = q(Others prefer parens as the quote mechanism.);

Quoting mechanisms:

q : literal quote; only character that needs to be escaped is the end character. qq : an interpreted quote; processes variables and escape characters. Great for strings that you need to quote:

my $var4 = qq{This "$mechanism" is broken.  Please inform "$user" at "$email" about it.};

qx : Works like qq, but then executes it as a system command, non interactively. Returns all the text generated from the standard out. (Redirection, if supported in the OS, also comes out) Also done with back quotes (the ` character).

my $output  = qx{type "$path"};      # get just the output
my $moreout = qx{type "$path" 2>&1}; # get stuff on stderr too

qr : Interprets like qq, but then compiles it as a regular expression. Works with the various options on the regex as well. You can now pass the regex around as a variable:

sub MyRegexCheck {
    my ($string, $regex) = @_;
    if ($string)
    {
       return ($string =~ $regex);
    }
    return; # returns 'null' or 'empty' in every context
}

my $regex = qr{http://[\w]\.com/([\w]+/)+};
@results = MyRegexCheck(q{http://myurl.com/subpath1/subpath2/}, $regex);

qw : A very, very useful quote operator. Turns a quoted set of whitespace separated words into a list. Great for filling in data in a unit test.


   my @allowed = qw(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z { });
   my @badwords = qw(WORD1 word2 word3 word4);
   my @numbers = qw(one two three four 5 six seven); # works with numbers too
   my @list = ('string with space', qw(eight nine), "a $var"); # works in other lists
   my $arrayref = [ qw(and it works in arrays too) ]; 

They're great to use them whenever it makes things clearer. For qx, qq, and q, I most likely use the {} operators. The most common habit of people using qw is usually the () operator, but sometimes you also see qw//.

The "for" statement can be used the same way "with" is used in Pascal:

for ($item)
{
    s/&‎nbsp;/ /g;
    s/<.*?>/ /g;
    $_ = join(" ", split(" ", $_));
}

You can apply a sequence of s/// operations, etc. to the same variable without having to repeat the variable name.

NOTE: the non-breaking space above (&‎nbsp;) has hidden Unicode in it to circumvent the Markdown. Don't copy paste it :)

Not really hidden, but many every day Perl programmers don't know about CPAN . This especially applies to people who aren't full time programmers or don't program in Perl full time.

The quoteword operator is one of my favourite things. Compare:

my @list = ('abc', 'def', 'ghi', 'jkl');

and

my @list = qw(abc def ghi jkl);

Much less noise, easier on the eye. Another really nice thing about Perl, that one really misses when writing SQL, is that a trailing comma is legal:

print 1, 2, 3, ;

That looks odd, but not if you indent the code another way:

print
    results_of_foo(),
    results_of_xyzzy(),
    results_of_quux(),
    ;

Adding an additional argument to the function call does not require you to fiddle around with commas on previous or trailing lines. The single line change has no impact on its surrounding lines.

This makes it very pleasant to work with variadic functions. This is perhaps one of the most under-rated features of Perl.

The ability to parse data directly pasted into a DATA block. No need to save to a test file to be opened in the program or similar. For example:

my @lines = <DATA>;
for (@lines) {
    print if /bad/;
}

__DATA__
some good data
some bad data
more good data 
more good data 

Binary "x" is the repetition operator :

print '-' x 80;     # print row of dashes

It also works with lists:

print for (1, 4, 9) x 3; # print 149149149

New Block Operations

I'd say the ability to expand the language, creating pseudo block operations is one.

  1. You declare the prototype for a sub indicating that it takes a code reference first:

     sub do_stuff_with_a_hash (&\\%) { my ( $block_of_code, $hash_ref ) = @_; while ( my ( $k, $v ) = each %$hash_ref ) { $block_of_code->( $k, $v ); } } 
  2. You can then call it in the body like so

     use Data::Dumper; do_stuff_with_a_hash { local $Data::Dumper::Terse = 1; my ( $k, $v ) = @_; say qq(Hey, the key is "$k"!); say sprintf qq(Hey, the value is "%v"!), Dumper( $v ); } %stuff_for ; 

( Data::Dumper::Dumper is another semi-hidden gem.) Notice how you don't need the sub keyword in front of the block, or the comma before the hash. It ends up looking a lot like: map { } @list

Source Filters

Also, there are source filters. Where Perl will pass you the code so you can manipulate it. Both this, and the block operations, are pretty much don't-try-this-at-home type of things.

I have done some neat things with source filters, for example like creating a very simple language to check the time, allowing short Perl one-liners for some decision making:

perl -MLib::DB -MLib::TL -e 'run_expensive_database_delete() if $hour_of_day < AM_7';

Lib::TL would just scan for both the "variables" and the constants, create them and substitute them as needed.

Again, source filters can be messy, but are powerful. But they can mess debuggers up something terrible--and even warnings can be printed with the wrong line numbers. I stopped using Damian's Switch because the debugger would lose all ability to tell me where I really was. But I've found that you can minimize the damage by modifying small sections of code, keeping them on the same line.

Signal Hooks

It's often enough done, but it's not all that obvious. Here's a die handler that piggy backs on the old one.

my $old_die_handler = $SIG{__DIE__};
$SIG{__DIE__}       
    = sub { say q(Hey! I'm DYIN' over here!); goto &$old_die_handler; }
    ;

That means whenever some other module in the code wants to die, they gotta come to you (unless someone else does a destructive overwrite on $SIG{__DIE__} ). And you can be notified that somebody things something is an error.

Of course, for enough things you can just use an END { } block, if all you want to do is clean up.

overload::constant

You can inspect literals of a certain type in packages that include your module. For example, if you use this in your import sub:

overload::constant 
    integer => sub { 
        my $lit = shift;
        return $lit > 2_000_000_000 ? Math::BigInt->new( $lit ) : $lit 
    };

it will mean that every integer greater than 2 billion in the calling packages will get changed to a Math::BigInt object. (See overload::constant ).

Grouped Integer Literals

While we're at it. Perl allows you to break up large numbers into groups of three digits and still get a parsable integer out of it. Note 2_000_000_000 above for 2 billion.

Taint checking. With taint checking enabled, perl will die (or warn, with -t ) if you try to pass tainted data (roughly speaking, data from outside the program) to an unsafe function (opening a file, running an external command, etc.). It is very helpful when writing setuid scripts or CGIs or anything where the script has greater privileges than the person feeding it data.

Magic goto. goto &sub does an optimized tail call.

The debugger.

use strict and use warnings . These can save you from a bunch of typos.

Based on the way the "-n" and "-p" switches are implemented in Perl 5, you can write a seemingly incorrect program including }{ :

ls |perl -lne 'print $_; }{ print "$. Files"'

which is converted internally to this code:

LINE: while (defined($_ = <ARGV>)) {
    print $_; }{ print "$. Files";
}

Let's start easy with the Spaceship Operator .

$a = 5 <=> 7;  # $a is set to -1
$a = 7 <=> 5;  # $a is set to 1
$a = 6 <=> 6;  # $a is set to 0

This is a meta-answer, but the Perl Tips archives contain all sorts of interesting tricks that can be done with Perl. The archive of previous tips is on-line for browsing, and can be subscribed to via mailing list or atom feed.

Some of my favourite tips include building executables with PAR , using autodie to throw exceptions automatically , and the use of the switch and smart-match constructs in Perl 5.10.

Disclosure: I'm one of the authors and maintainers of Perl Tips, so I obviously think very highly of them. ;)

map - 不仅因为它使得一个人的代码更具表现力,而且因为它让我有一种冲动来阅读更多关于这种“函数式编程”的信息。

My vote would go for the (?{}) and (??{}) groups in Perl's regular expressions. The first executes Perl code, ignoring the return value, the second executes code, using the return value as a regular expression.

The continue clause on loops. It will be executed at the bottom of every loop, even those which are next'ed.

while( <> ){
  print "top of loop\n";
  chomp;

  next if /next/i;
  last if /last/i;

  print "bottom of loop\n";
}continue{
  print "continue\n";
}

The m// operator has some obscure special cases:

  • If you use ? as the delimiter it only matches once unless you call reset .
  • If you use ' as the delimiter the pattern is not interpolated.
  • If the pattern is empty it uses the pattern from the last successful match.
while(/\G(\b\w*\b)/g) {
     print "$1\n";
}

the \\G anchor. It's hot .

The null filehandle diamond operator <> has its place in building command line tools. It acts like <FH> to read from a handle, except that it magically selects whichever is found first: command line filenames or STDIN. Taken from perlop:

while (<>) {
...         # code for each line
}

Special code blocks such as BEGIN , CHECK and END . They come from Awk, but work differently in Perl, because it is not record-based.

The BEGIN block can be used to specify some code for the parsing phase; it is also executed when you do the syntax-and-variable-check perl -c . For example, to load in configuration variables:

BEGIN {
    eval {
        require 'config.local.pl';
    };
    if ($@) {
        require 'config.default.pl';
    }
}
rename("$_.part", $_) for "data.txt";

将data.txt.part重命名为data.txt而不必重复自己。

A bit obscure is the tilde-tilde "operator" which forces scalar context.

print ~~ localtime;

is the same as

print scalar localtime;

and different from

print localtime;

输入记录分隔符可以设置为对数字的引用以读取固定长度的记录:

$/ = \3; print $_,"\n" while <>; # output three chars on each line

tie,变量绑定接口。

The "desperation mode" of Perl's loop control constructs which causes them to look up the stack to find a matching label allows some curious behaviors which Test::More takes advantage of, for better or worse.

SKIP: {
    skip() if $something;

    print "Never printed";
}

sub skip {
    no warnings "exiting";
    last SKIP;
}

There's the little known .pmc file. "use Foo" will look for Foo.pmc in @INC before Foo.pm. This was intended to allow compiled bytecode to be loaded first, but Module::Compile takes advantage of this to cache source filtered modules for faster load times and easier debugging.

The ability to turn warnings into errors.

local $SIG{__WARN__} = sub { die @_ };
$num = "two";
$sum = 1 + $num;
print "Never reached";

That's what I can think of off the top of my head that hasn't been mentioned.

The goatse operator * :

$_ = "foo bar";
my $count =()= /[aeiou]/g; #3

or

sub foo {
    return @_;
}

$count =()= foo(qw/a b c d/); #4

It works because list assignment in scalar context yields the number of elements in the list being assigned.

* Note, not really an operator

I don't know how esoteric it is, but one of my favorites is the hash slice . I use it for all kinds of things. For example to merge two hashes:

my %number_for = (one => 1, two => 2, three => 3);
my %your_numbers = (two => 2, four => 4, six => 6);
@number_for{keys %your_numbers} = values %your_numbers;
print sort values %number_for; # 12346

This one isn't particularly useful, but it's extremely esoteric. I stumbled on this while digging around in the Perl parser.

Before there was POD, perl4 had a trick to allow you to embed the man page, as nroff, straight into your program so it wouldn't get lost. perl4 used a program called wrapman (see Pink Camel page 319 for some details) to cleverly embed an nroff man page into your script.

It worked by telling nroff to ignore all the code, and then put the meat of the man page after an END tag which tells Perl to stop processing code. Looked something like this:

#!/usr/bin/perl
'di';
'ig00';

...Perl code goes here, ignored by nroff...

.00;        # finish .ig

'di         \" finish the diversion
.nr nl 0-1  \" fake up transition to first page
.nr % 0     \" start at page 1
'; __END__

...man page goes here, ignored by Perl...

The details of the roff magic escape me, but you'll notice that the roff commands are strings or numbers in void context. Normally a constant in void context produces a warning. There are special exceptions in op.c to allow void context strings which start with certain roff commands.

              /* perl4's way of mixing documentation and code
                 (before the invention of POD) was based on a
                 trick to mix nroff and perl code. The trick was
                 built upon these three nroff macros being used in
                 void context. The pink camel has the details in
                 the script wrapman near page 319. */
                const char * const maybe_macro = SvPVX_const(sv);
                if (strnEQ(maybe_macro, "di", 2) ||
                    strnEQ(maybe_macro, "ds", 2) ||
                    strnEQ(maybe_macro, "ig", 2))
                        useless = NULL;

This means that 'di'; doesn't produce a warning, but neither does 'die'; 'did you get that thing I sentcha?'; or 'ignore this line'; .

In addition, there are exceptions for the numeric constants 0 and 1 which allows the bare .00; . The code claims this was for more general purposes.

            /* the constants 0 and 1 are permitted as they are
               conventionally used as dummies in constructs like
                    1 while some_condition_with_side_effects;  */
            else if (SvNIOK(sv) && (SvNV(sv) == 0.0 || SvNV(sv) == 1.0))
                useless = NULL;

And what do you know, 2 while condition does warn!

You can use @{[...]} to get an interpolated result of complex perl expressions

$a = 3;
$b = 4;

print "$a * $b = @{[$a * $b]}";

prints: 3 * 4 = 12

sub load_file
{
    local(@ARGV, $/) = shift;
    <>;
}

and a version that returns an array as appropriate:

sub load_file
{
    local @ARGV = shift;
    local $/ = wantarray? $/: undef;
    <>;
}
use diagnostics;

If you are starting to work with Perl and have never done so before, this module will save you tons of time and hassle. For almost every basic error message you can get, this module will give you a lengthy explanation as to why your code is breaking, including some helpful hints as to how to fix it. For example:

use strict;
use diagnostics;

$var = "foo";

gives you this helpful message:

Global symbol "$var" requires explicit package name at - line 4.
Execution of - aborted due to compilation errors (#1)
    (F) You've said "use strict vars", which indicates that all variables
    must either be lexically scoped (using "my"), declared beforehand using
    "our", or explicitly qualified to say which package the global variable
    is in (using "::").

Uncaught exception from user code:
        Global symbol "$var" requires explicit package name at - line 4.
Execution of - aborted due to compilation errors.
 at - line 5
use diagnostics;
use strict;

sub myname {
    print { " Some Error " };
};

you get this large, helpful chunk of text:

syntax error at - line 5, near "};"
Execution of - aborted due to compilation errors (#1)
(F) Probably means you had a syntax error.  Common reasons include:

    A keyword is misspelled.
    A semicolon is missing.
    A comma is missing.
    An opening or closing parenthesis is missing.
    An opening or closing brace is missing.
    A closing quote is missing.

Often there will be another error message associated with the syntax
error giving more information.  (Sometimes it helps to turn on -w.)
The error message itself often tells you where it was in the line when
it decided to give up.  Sometimes the actual error is several tokens
before this, because Perl is good at understanding random input.
Occasionally the line number may be misleading, and once in a blue moon
the only way to figure out what's triggering the error is to call
perl -c repeatedly, chopping away half the program each time to see
if the error went away.  Sort of the cybernetic version of S.

Uncaught exception from user code:
    syntax error at - line 5, near "};"
Execution of - aborted due to compilation errors.
at - line 7

From there you can go about deducing what might be wrong with your program (in this case, print is formatted entirely wrong). There's a large number of known errors with diagnostics. Now, while this would not be a good thing to use in production, it can serve as a great learning aid for those who are new to Perl.

There also is $[ the variable which decides at which index an array starts. Default is 0 so an array is starting at 0. By setting

$[=1;

You can make Perl behave more like AWK (or Fortran) if you really want to.

($x, $y) = ($y, $x) is what made me want to learn Perl.

The list constructor 1..99 or 'a'..'zz' is also very nice.

@Schwern mentioned turning warnings into errors by localizing $SIG{__WARN__} . You can do also do this (lexically) with use warnings FATAL => "all"; . See perldoc lexwarn .

On that note, since Perl 5.12, you've been able to say perldoc foo instead of the full perldoc perlfoo . Finally! :)

The Schwartzian Transform is a technique that allows you to efficiently sort by a computed, secondary index. Let's say that you wanted to sort a list of strings by their md5 sum. The comments below are best read backwards (that's the order I always end up writing these anyways):

my @strings = ('one', 'two', 'three', 'four');

my $md5sorted_strings = 
    map { $_->[0] }               # 4) map back to the original value
    sort { $a->[1] cmp $b->[1] }  # 3) sort by the correct element of the list
    map { [$_, md5sum_func($_)] } # 2) create a list of anonymous lists
    @strings                      # 1) take strings

This way, you only have to do the expensive md5 computation N times, rather than N log N times.

Safe compartments.

With the Safe module you can build your own sandbox-style environment using nothing but perl. You would then be able to load perl scripts into the sandbox.

Best regards,

Core IO::Handle module. Most important thing for me is that it allows autoflush on filehandles. Example:

use IO::Handle;    
$log->autoflush(1);

One useful composite operator for conditionally adding strings or lists into other lists is the x!! operator:

 print 'the meaning of ', join ' ' =>  
     'life,'                x!! $self->alive,
     'the universe,'        x!! ($location ~~ Universe),
     ('and', 'everything.') x!! 42; # this is added as a list

this operator allows for a reversed syntax similar to

 do_something() if test();

How about the ability to use

my @symbols = map { +{ 'key' => $_ } } @things;

to generate an array of hashrefs from an array -- the + in front of the hashref disambiguates the block so the interpreter knows that it's a hashref and not a code block. Awesome.

(Thanks to Dave Doyle for explaining this to me at the last Toronto Perlmongers meeting.)

All right. Here is another. Dynamic Scoping . It was talked about a little in a different post, but I didn't see it here on the hidden features.

Dynamic Scoping like Autovivification has a very limited amount of languages that use it. Perl and Common Lisp are the only two I know of that use Dynamic Scoping.

Use lvalues to make your code really confusing:

my $foo = undef ;
sub bar:lvalue{ return $foo ;}

# Then later

bar = 5 ;
print bar ;

这个单行说明了如何使用 glob 为指定长度的单词 (4) 生成字母表的所有单词组合(A、T、C 和 G -> DNA):

perl -MData::Dumper -e '@CONV = glob( "{A,T,C,G}" x 4 ); print Dumper( \@CONV )'

Quantum::Superpositions

use Quantum::Superpositions;

if ($x == any($a, $b, $c)) { ...  }

My favorite semi-hidden feature of Perl is the eof function. Here's an example pretty much directly from perldoc -f eof that shows how you can use it to reset the file name and $. (the current line number) easily across multiple files loaded up at the command line:

while (<>) {
  print "$ARGV:$.\t$_";
} 
continue {
  close ARGV if eof
}

You can replace the delimiter in regexes and strings with just about anything else. This is particularly useful for "leaning toothpick syndrome", exemplified here:

$url =~ /http:\/\/www\.stackoverflow\.com\//;

You can eliminate most of the back-whacking by changing the delimiter. /bar/ is shorthand for m/bar/ which is the same as m!bar! .

$url =~ m!http://www\.stackoverflow\.com/!;

You can even use balanced delimiters like {} and []. I personally love these. q{foo} is the same as 'foo' .

$code = q{
    if( this is awesome ) {
        print "Look ma, no escaping!";
    }
};

To confuse your friends (and your syntax highlighter) try this:

$string = qq'You owe me $1,000 dollars!';

There is a more powerful way to check program for syntax errors:

perl -w -MO=Lint,no-context myscript.pl

The most important thing that it can do is reporting for 'unexistant subroutine' errors.

use re debug
Doc on use re debug

and

perl -MO=Concise[,OPTIONS]
Doc on Concise

Besides being exquisitely flexible, expressive and amenable to programing in the style of C, Pascal, Python and other languages, there are several pragmas command switches that make Perl my 'goto' language for initial kanoodling on an algorithm, regex, or quick problems that needs to be solved. These two are unique to Perl I believe, and are among my favorites.

use re debug : Most modern flavors of regular expressions owe their current form and function to Perl. While there are many Perl forms of regex that cannot be expressed in other languages, there are almost no forms of other languages' flavor of regex that cannot be expressed in Perl. Additionally, Perl has a wonderful regex debugger built in to show how the regex engine is interpreting your regex and matching against the target string.

Example: I recently was trying to write a simple CSV routine. (Yes, yes, I know, I should have been using Text::CSV... ) but the CSV values were not quoted and simple.

My first take was /^(^(?:(.*?),){$i}/ to extract the i record on n CSV records. That works fine -- except for the last record or n of n. I could see that without the debugger.

Next I tried /^(?:(.*?),|$){$i}/ This did not work, and I could not see immediately why. I thought I was saying (.*?) followed by a comma or EOL. Then I added use re debug at the top of a small test script. Ahh yes, the alteration between ,|$ was not being interpreted that way; it was being interpreted as ((.*?),) | ($) ((.*?),) | ($) -- not what I wanted.

A new grouping was needed . So I arrived at the working /^(?:(.*?)(?:,|$)){$i}/ . While I was in the regex debugger, I was surprised how many loops it took for a match towards the end of the string. It is the .*? term that is quite ambiguous and requires excessive backtracking to satisfy. So I tried /^(?:(?:^|,)([^,]*)){$i}/ This does two things: 1) reduces backtracking because of the greedy match of all but a comma 2) allowed the regex optimizer to only use the alteration once on the first field. Using Benchmark, this is 35% faster than the first regex. The regex debugger is wonderful and few use it.

perl -MO=Concise[,OPTIONS] : The B and Concise frameworks are tremendous tools to see how Perl is interpreting your masterpiece. Using the -MO=Concise prints the result of the Perl interpreters translation of your source code. There are many options to Concise and in B, you can write your own presentation of the OP codes.

As in this post , you can use Concise to compare different code structures. You can interleave your source lines with the OP codes those lines generate. Check it out.

You can use different quotes on HEREDOCS to get different behaviors.

my $interpolation = "We will interpolated variables";
print <<"END";
With double quotes, $interpolation, just like normal HEREDOCS.
END

print <<'END';
With single quotes, the variable $foo will *not* be interpolated.
(You have probably seen this in other languages.)
END

## this is the fun and "hidden" one
my $shell_output = <<`END`;
echo With backticks, these commands will be executed in shell.
echo The output is returned.
ls | wc -l
END

print "shell output: $shell_output\n";

Very late to the party, but: attributes.

Attributes essentially let you define arbitrary code to be associated with the declaration of a variable or subroutine. The best way to use these is with Attribute::Handlers ; this makes it easy to define attributes (in terms of, what else, attributes!).

I did a presentation on using them to declaratively assemble a pluggable class and its plugins at YAPC::2006, online here . This is a pretty unique feature.

I personally love the /e modifier to the s/// operation:

while(<>) {
  s/(\w{0,4})/reverse($1);/e; # reverses all words between 0 and 4 letters
  print;
}

Input:

This is a test of regular expressions
^D

Output (I think):

sihT si a tset fo regular expressions

The following are just as short but more meaningful than "~~" since they indicate what is returned, and there's no confusion with the smart match operator:

print "".localtime;   # Request a string

print 0+@array;       # Request a number

Axeman reminded me of how easy it is to wrap some of the built-in functions.

Before Perl 5.10 Perl didn't have a pretty print(say) like Python.

So in your local program you could do something like:

sub print {
     print @_, "\n";
}

or add in some debug.

sub print {
    exists $ENV{DEVELOPER} ?
    print Dumper(@_) :
    print @_;
}

Two things that work well together: IO handles on in-core strings, and using function prototypes to enable you to write your own functions with grep/map-like syntax.

sub with_output_to_string(&) {           # allows compiler to accept "yoursub {}" syntax.
  my $function = shift;
  my $string   = '';
  my $handle   = IO::Handle->new();
  open($handle, '>', \$string) || die $!; # IO handle on a plain scalar string ref
  my $old_handle = select $handle;
  eval { $function->() };
  select $old_handle;
  die $@ if $@;
  return $string;
}

my $greeting = with_output_to_string {
  print "Hello, world!";
};

print $greeting, "\n";

The ability to use a hash as a seen filter in a loop. I have yet to see something quite as nice in a different language. For example, I have not been able to duplicate this in python.

For example, I want to print a line if it has not been seen before.

my %seen;

for (<LINE>) {
  print $_ unless $seen{$_}++;
}

The new -E option on the command line:

> perl -e "say 'hello"" # does not work 

String found where operator expected at -e line 1, near "say 'hello'"
        (Do you need to predeclare say?)
syntax error at -e line 1, near "say 'hello'"
Execution of -e aborted due to compilation errors.

> perl -E "say 'hello'" 
hello

You can expand function calls in a string, for example;

print my $foo = "foo @{[scalar(localtime)]} bar";

foo Wed May 26 15:50:30 2010 bar

The feature I like the best is statement modifiers.

Don't know how many times I've wanted to do:

say 'This will output' if 1;
say 'This will not output' unless 1;
say 'Will say this 3 times. The first Time: '.$_ for 1..3;

in other languages. etc...

The 'etc' reminded me of another 5.12 feature, the Yada Yada operator.

This is great, for the times when you just want a place holder.

sub something_really_important_to_implement_later {
    ...
} 

Check it out: Perl Docs on Yada Yada Operator .

如果程序在调试器中运行,则defined &DB::DB表达式返回 true。

Interpolation of match regular expressions. A useful application of this is when matching on a blacklist. Without using interpolation it is written like so:

#detecting blacklist words in the current line
/foo|bar|baz/;

Can instead be written

@blacklistWords = ("foo", "bar", "baz");
$anyOfBlacklist = join "|", (@blacklistWords);
/$anyOfBlacklist/;

This is more verbose, but allows for population from a datafile. Also if the list is maintained in the source for whatever reason, it is easier to maintain the array then the RegExp.

Using hashes (where keys are unique) to obtain the unique elements of a list:

my %unique = map { $_ => 1 } @list;
my @unique = keys %unique;

I'm a bit late to the party, but a vote for the built-in tied-hash function dbmopen() -- it's helped me a lot. It's not exactly a database, but if you need to save data to disk it takes away a lot of the problems and Just Works. It helped me get started when I didn't have a database, didn't understand Storable.pm, but I knew I wanted to progress beyond reading and writing to text files.

Add one for the unpack() and pack() functions, which are great if you need to import and/or export data in a format which is used by other programs.

Of course these days most programs will allow you to export data in XML, and many commonly used proprietary document formats have associated Perl modules written for them. But this is one of those features that is incredibly useful when you need it, and pack()/unpack() are probably the reason that people have been able to write CPAN modules for so many proprietary data formats.

Next time you're at a geek party pull out this one-liner in a bash shell and the women will swarm you and your friends will worship you:

find . -name "*.txt"|xargs perl -pi -e 's/1:(\\S+)/uc($1)/ge'

Process all *.txt files and do an in-place find and replace using perl's regex. This one converts text after a '1:' to upper case and removes the '1:'. Uses Perl's 'e' modifier to treat the second part of the find/replace regex as executable code. Instant one-line template system. Using xargs lets you process a huge number of files without running into bash's command line length limit.

You might think you can do this to save memory:

@is_month{qw(jan feb mar apr may jun jul aug sep oct nov dec)} = undef;

print "It's a month" if exists $is_month{lc $mon};

but it doesn't do that. Perl still assigns a different scalar value to each key. Devel::Peek shows this. PVHV is the hash. Elt is a key and the SV that follows is its value. Note that each SV has a different memory address indicating they're not being shared.

Dump \%is_month, 12;

SV = RV(0x81c1bc) at 0x81c1b0
  REFCNT = 1
  FLAGS = (TEMP,ROK)
  RV = 0x812480
  SV = PVHV(0x80917c) at 0x812480
    REFCNT = 2
    FLAGS = (SHAREKEYS)
    ARRAY = 0x206f20  (0:8, 1:4, 2:4)
    hash quality = 101.2%
    KEYS = 12
    FILL = 8
    MAX = 15
    RITER = -1
    EITER = 0x0
    Elt "feb" HASH = 0xeb0d8580
    SV = NULL(0x0) at 0x804b40
      REFCNT = 1
      FLAGS = ()
    Elt "may" HASH = 0xf2290c53
    SV = NULL(0x0) at 0x812420
      REFCNT = 1
      FLAGS = ()

An undef scalar takes as much memory as an integer scalar, so you might ask well just assign them all to 1 and avoid the trap of forgetting to check with exists .

my %is_month = map { $_ => 1 } qw(jan feb mar apr may jun jul aug sep oct nov dec);

print "It's a month" if $is_month{lc $mon});

$0 is the name of the perl script being executed. It can be used to get the context from which a module is being run.

# MyUsefulRoutines.pl

sub doSomethingUseful {
  my @args = @_;
  # ...
}

if ($0 =~ /MyUsefulRoutines.pl/) {
  # someone is running  perl MyUsefulRoutines.pl [args]  from the command line
  &doSomethingUseful (@ARGV);
} else {
  # someone is calling  require "MyUsefulRoutines.pl"  from another script
  1;
}

This idiom is helpful for treating a standalone script with some useful subroutines into a library that can be imported into other scripts. Python has similar functionality with the object.__name__ == "__main__" idiom.

@Corion - Bare URLs in Perl? Of course you can, even in interpolated strings. The only time it would matter is in a string that you were actually USING as a regular expression.

Showing progress in the script by printing on the same line:

$| = 1; # flush the buffer on the next output 

for $i(1..100) {
    print "Progress $i %\r"
}

using bare blocks with redo or other control words to create custom looping constructs.

traverse a linked list of objects returning the first ->can('print') method:

sub get_printer {
    my $self = shift;
    {$self->can('print') or $self = $self->next and redo}
}

Perl is great as a flexible awk/sed.

For example lets use a simple replacement for ls | xargs stat ls | xargs stat , naively done like:

$ ls | perl -pe 'print "stat "' | sh 

This doesn't work well when the input (filenames) have spaces or shell special characters like |$\\ . So single quotes are frequently required in the Perl output.

One complication with calling perl via the command line -ne is that the shell gets first nibble at your one-liner. This often leads to torturous escaping to satisfy it.

One 'hidden' feature that I use all the time is \\x27 to include a single quote instead of trying to use shell escaping '\\''

So:

$ ls | perl -nle 'chomp; print "stat '\''$_'\''"' | sh

can be more safely written:

$ ls | perl -pe 's/(.*)/stat \x27$1\x27/' | sh

That won't work with funny characters in the filenames, even quoted like that. But this will:

$ ls | perl -pe 's/\n/\0/' | xargs -0 stat

"now"

sub _now { 
        my ($now) = localtime() =~ /([:\d]{8})/;
        return $now;
}

print _now(), "\n"; #  15:10:33

One more...

Perl cache:

my $processed_input = $records || process_inputs($records_file);

On Elpeleg Open Source, Perl CMS http://www.web-app.net/

B::Deparse - Perl compiler backend to produce perl code. Not something you'd use in your daily Perl coding, but could be useful in special circumstances.

If you come across some piece of code that is obfuscated, or a complex expression, pass it through Deparse . Useful to figure out a JAPH or a Perl code that is golfed.

$ perl -e '$"=$,;*{;qq{@{[(A..Z)[qq[0020191411140003]=~m[..]g]]}}}=*_=sub{print/::(.*)/};$\=$/;q<Just another Perl Hacker>->();'
Just another Perl Hacker

$ perl -MO=Deparse -e '$"=$,;*{;qq{@{[(A..Z)[qq[0020191411140003]=~m[..]g]]}}}=*_=sub{print/::(.*)/};$\=$/;q<Just another Perl Hacker>->();'
$" = $,;
*{"@{[('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z')['0020191411140003' =~ /../g]];}";} = *_ = sub {
    print /::(.*)/;
}
;
$\ = $/;
'Just another Perl Hacker'->();
-e syntax OK

A more useful example is to use deparse to find out the code behind a coderef, that you might have received from another module, or

use B::Deparse;
my $deparse = B::Deparse->new;
$code = $deparse->coderef2text($coderef);
print $code;

I like the way we can insert a element in any place in the array, such as

=> Insert $x in position $i in array @a

@a = ( 11, 22, 33, 44, 55, 66, 77 );
$x = 10;
$i = 3;

@a = ( @a[0..$i-1], $x, @a[$i..$#a] );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM