简体   繁体   中英

Matching multiline string in file using perl regex

I am reading in another perl file and trying to find all strings surrounded by quotations within the file, single or multiline. I've matched all the single lines fine but I can't match the mulitlines without printing the entire line out, when I just want the string itself. For example, heres a snippet of what I'm reading in:

#!/usr/bin/env perl
use warnings;
use strict;

# assign variable

my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun 
    multiple line string, please match";

so the output I'd like is

'Hello World!';
"chmod";
"This is a fun multiple line string, please match";

but I am getting:

'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun 
    multiple line string, please match";

This is the code I am using to find the strings - all file content is stored in @contents:

my @strings_found = ();
my $line; 
for(@contents) {
    $line .= $_;
}

if($line =~ /(['"](.?)*["'])/s) {
    push @strings_found,$1;
}

print @strings_found;

I am guessing I am only getting 'Hello World!'; correctly because I am using the $1 but I am not sure how else to find the others without looping line by line, which I would think would make it hard to find the multi line string as it doesn't know what the next line is.

I know my regex is reasonably basic and doesn't account for some caveats but I just wanted to get the basic catch most regex working before moving on to more complex situations.

Any pointers as to where I am going wrong?

Couple big things, you need to search in a while loop with the g modifier on your regex. And you also need to turn off greedy matching for what's inside the quotes by using .*? .

use strict;
use warnings;

my $contents = do {local $/; <DATA>};

my @strings_found = ();

while ($contents =~ /(['"](.*?)["'])/sg) {
    push @strings_found, $1;
}

print "$_\n" for @strings_found;

__DATA__
#!/usr/bin/env perl
use warnings;
use strict;

# assign variable

my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun 
    multiple line string, please match";

Outputs

'Hello World!'
"chmod"
"This is a fun
    multiple line string, please match"

You aren't the first person to search for help with this homework problem. Here's a more detailed answer I gave to ... well ... you ;) finding words surround by quotations perl

regexp matching (in perl and generally) are greedy by default. So your regexp will match from 1st ' or " to last. Print the length of your @strings_found array. I think it will always be just 1 with the code you have.

Change it to be not greedy by following * with a ? /('"*?["'])/s I think.

It will work in a basic way. Regexps are kindof the wrong way to do this if you want a robust solution. You would want to write parsing code instead for that. If you have different quotes inside a string then greedy will give you the 1 biggest string. Non greedy will give you the smallest strings not caring if start or end quote are different.

Read about greedy and non greedy. Also note the /m multiline modifier. http://perldoc.perl.org/perlre.html#Regular-Expressions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM