简体   繁体   中英

PERL Regex and appending

I have the following perl code. I am trying to grep the path from an array links and append "\\" or "/" at the end and push it into new array. But I am not getting desired output. What am I missing ?

use strict;

my @links = (
    "incl -s projectA /. /abc/cde/efg",
    "incl -s projectA \. \hij\klm\nop",
);

my ( $path, $link, @linkpaths, $op );
my $substr = "/";

foreach $link (@links) {
    $link =~ m{incl -s projectA /. /|\\.\\(.+)};
    $path = $1;
    print "Path is $path \n";
    if ( index( $path, $substr ) != -1 ) {
        print "$link contains $substr\n";
        $op = "/";
    } else {
        print "$link doesnt contains $substr\n";
        $op = "\\";
    }
    push @linkpaths, $path . $op;
}

print "\nlinkpaths:\n";
foreach (@linkpaths) {
    print "$_\n";
}

Desired Output:

Path is abc/cde/efg
abc/cde/efg contains /
Path is \hij\klm\nop
hij\klm\nop doesnt contain /

linkpaths:
abc/cde/efg/
hij\klm\nop\

The problem is that the special characters in your strings -- both simple strings and regular expressions -- are not escaped, and you have no use warnings at the top of your program, which would have alerted you to this.

For instance, if I add use warnings and use Data::Dump to display your @links array I gets this

Unrecognized escape \h passed through at E:\Perl\source\dd.pl line 8.
Unrecognized escape \k passed through at E:\Perl\source\dd.pl line 8.
[
  "incl -s projectA /. /abc/cde/efg",
  "incl -s projectA . hijklm\nop",
]

So some of the backslashes in the second element have vanished.

Now the regex looks fine on the face of it, but I hope it is clear that your alternation extends to the full length of the pattern, so

m{incl -s projectA /. /|\\.\\(.+)}

matches either

incl -s projectA /. /

or

\\.\\(.+)

which isn't at all what you had in mind. You also need to escape the dots . which otherwise match any character other than a newline; and you have dropped a space, so you currently have either /. / /. / (with an internediate space) or \\.\\ (without one).

It's a little trickier to fix than you might hope because (I think) you want to capture everything after projectA , but also allow for either forward or backward slashes. That would become

m{incl -s projectA ((?:/\. /|\\\. \\).+)}

which, employing the /x modifier and replacing literal spaces with \\s+ , I hope you'll agree can be more clearly written

m{ incl \s+ -s \s+ projectA \s+ ( (?: /\. \s+ / | \\\. \s+ \\ ) .+ ) }x

Here's a fixed version of your code that includes all of the changes I have described.

use strict;
use warnings;

my @links = (
   'incl -s projectA /. /abc/cde/efg',
   'incl -s projectA \. \hij\klm\nop',
);

my ($path, $link, @linkpaths, $op);
my $substr = "/";

for my $link (@links) {

   $link =~ m{incl \s+ -s \s+ projectA \s+ ( (?: /\. \s+ / | \\\. \s+ \\) .+ )}x;
   $path = $1;
   print "Path is $path \n";
   if (index($path, $substr) >= 0) {
      print "$link contains $substr\n";
      $op = "/";
   }
   else {
      print "$link doesn't contain $substr\n";
      $op = "\\";
   }
   push @linkpaths, "$path$op";
}


print "\n";
print "linkpaths:\n";
print "$_\n" for @linkpaths;

output

Path is /. /abc/cde/efg 
incl -s projectA /. /abc/cde/efg contains /
Path is \. \hij\klm\nop 
incl -s projectA \. \hij\klm\nop doesn't contain /

linkpaths:
/. /abc/cde/efg/
\. \hij\klm\nop\

Update

To capture only the last path in each element of the input list that starts with a slash or backslash, I would replace the end of the pattern with this (?: /\\. \\s+ | \\\\\\. \\s+ ) (.+) instead. But I believe it's far tider to use a character class to represent either a forward or a backward slash, like [/\\\\] .

This is another change to your complete program

use strict;
use warnings;

my @links =(
   'incl -s projectA /. /abc/cde/efg',
   'incl -s projectA \. \hij\klm\nop',
);

my @linkpaths;
my $substr = '/';

for (@links) {

 next unless my ($path) = m{ incl \s+ -s \s+ projectA \s+ [/\\]\. \s+ ([/\\].+) }x;

 print "Path is $path\n";

 my $op;
 if ($path =~ /\Q$substr/) {
    printf "%s contains %s\n", $_, $substr;
    $op = '/';
 }
 else {
    printf "%s doesn't contain %s\n", $_, $substr;
    $op = '\\';
 }

 push @linkpaths, "$path$op";
}


print "\n";
print "linkpaths:\n";
print "$_\n" for @linkpaths;   

output

Path is /abc/cde/efg
incl -s projectA /. /abc/cde/efg contains /
Path is \hij\klm\nop
incl -s projectA \. \hij\klm\nop doesn't contain /

linkpaths:
/abc/cde/efg/
\hij\klm\nop\

Probably want a regex like this

 # m{incl[ ]-s[ ]projectA(?|[ ]/\.[ ](/)|[ ]\\\.[ ](\\))((?:(?!\1$).)+)$}g

 incl [ ] -s [ ] projectA
 (?|
      [ ] /\. [ ] 
      ( / )                         # (1)
   |  [ ] \\\. [ ] 
      ( \\ )                        # (1)
 )
 (                             # (2 start)
      (?:
           (?! \1 $ )
           . 
      )+
 )                             # (2 end)
 $

Sample:

use strict;
use warnings;

my @links =(
        'incl -s projectA /. /abc/cde/efg',
        'incl -s projectA \. \hij\klm\nop'
        );

my ($path,$link,@linkpaths,$op);
my $substr="/";

for (@links) {
    if ( m{incl[ ]-s[ ]projectA(?|[ ]/\.[ ](/)|[ ]\\\.[ ](\\))((?:(?!\1$).)+)$}g )
    { 
       ($op, $path) = ($1,$2);
       print "Path is $path \n";
       if ($op eq '/' ) {
          print "$path contains /\n";
       } 
       else {
          print "$path doesnt contain /\n";
       }
       push @linkpaths, $path . $op;
    }
}
print "\nlinkpaths:\n";
for (@linkpaths) {
   print "$_\n";
}   

Output:

Path is abc/cde/efg
abc/cde/efg contains /
Path is hij\klm\nop
hij\klm\nop doesnt contain /

linkpaths:
abc/cde/efg/
hij\klm\nop\

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM