简体   繁体   中英

Using the length of the matched group inside regex

Assume this

char=l
string="Hello, World!"

Now, I want to replace all char in string but continuous occurrence (run-length encoding) while reading from STDIN

I tried this:

$c=<>;$_=<>;print s/($c)\1*/length($&)/grse;

When the input is given as

l
Hello, World!

It returns Hello, World! . But when I ran this

$c=<>;$_=<>;print s/(l)\1*/length($&)/grse;

it returned He2o, Wor1d .

So, since the input is given in separate lines, $c contained \\n (checked with $c=~/\\n/ ) So, I tried

$c=<>.chomp;$_=<>;print s/($c)\1*/length($&)/grse;

and

$c=<>;$_=<>;print s/($c.chomp)\1*/length($&)/grse;

Neither worked. Could anyone please say why?

In Perl, . is used to concatenate strings, and not to call methods (unlike in some other languages; Ruby for instance). Have a look at documentation of chomp to see how it should be use. You should be doing

chomp($c=<>)

Rather than

$c=<>.chomp

Your full code should thus simply be:

chomp($c=<>);$_=<>;print s/($c)\1*/length($&)/grse;

If $c is always a single character, then the regex can be simplified to s/$c+/length($&)/grse . Also, if $c can be a regex meta-character (eg, + , * , ( , [ , etc), then it you should escape it (and it makes sense to escape it just in case). To do so, you can use \\Q..\\E (or quotemeta , although it is more verbose and thus maybe less adapted to a one-liner):

s/\Q$c\E+/length($&)/grse

If you don't escape $c one way or another, and your one-liner is ran with ( as first input for instance, you'll get the following error:

Quantifier follows nothing in regex; marked by <-- HERE in m/(+ <-- HERE / at -e line 1, <> line 2

Regarding what $c=<>.chomp actually means in Perl (since this is a valid Perl code that can make sense in some contexts):

$c=<>.chomp means <> concatenated to chomp , where chomp without arguments is understood as chomp($_) . And chomp returns the total number of characters removed, and since $_ is empty, no characters are removed, which means that this chomp returns 0 . So you are basically writing $c=<>.0 , which means that if your input is l\\n , you end up with l\\n0 instead of l .

One way to debug this kind of this yourself is to:

  • Enable warnings with the -w flag. In that case, it would have printed

    Use of uninitialized value $_ in scalar chomp at -e line 1, <> line 1.

    This is arguably not the most helpful warning ever, but it would have helped you get an idea of where your mistake was.

  • Print variables to be sure that they contain what you expect. For instance, you could co perl -wE '$c=<>.chomp;print"|$c|"' , which would print:

     |l 0|

    Which should help giving you an idea of what was wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM