简体   繁体   中英

Regular Expression to match word in the middle but at the end? (php is crashing :(

I want to find lines that are between a label and a return with no line indent. example:

myLabel:
bla
if(no)
  return
else
  foo
return

If I use for my last return some other word. for eg send it works.

$r1 = '^(\w[\w\d_]*:\s*\n((?!\nreturn).)*)(\n[^\s][^n]*\n)((((?!\nreturn).)*)\nsend)'; ; working regex

but $r2 doesn't work. Perl crashes.

$r2 = '^(\w[\w\d_]*:\s*\n((?!\nreturn).)*)(\n[^\s][^n]*\n)((((?!\nreturn).)*)\nreturn)'; ; dont working regex

Here is an example in php for testing

$str = '^(\w[\w\d_]*:\s*\n((?!\nreturn).)*)(\n[^\s][^n]*\n)((((?!\nreturn).)*)\nreturn)';
$actual = preg_replace('/^'.$str.'/smi', "$1" . $indentStr . "$2$3", $actual);

If this not work then I will use a loop throw all source code line. I will use it to prettyfy Autohotkey source code with this tool: https://github.com/sl5net/SL5_AHK_Refactor_engine

Your pattern is very complicated and uses the "famous" trick: ((?!\\nreturn).)* that is slow and that doesn't prevent a lot of backtracking if the subpatterns after fail.

You can write your pattern in a more simple way:

$pattern = '~^\w+:\R(?:\N*\R)*?return$~m';

demo

details:

~            # pattern delimiter
^            # anchor for the start of the line (m option)
\w+:         # the label name
\R           # alias for any kind of newline sequences
(?:\N*\R)*?  # lines until (non-greedy number of line)
return       # "return"
$            # end of the line (remove it if uneeded)
~m           # pattern delimiter, multiline option

\\N matches any character except newline whatever the mode (singleline or not). In this case you can replace it with a dot, but it is less explicit.

\\R is an alias for several sequences of newlines \\r\\n , \\n or more exotic. if you already know what kind of newline sequence is used in your string, replace it with this sequence.

see this other version

In short, the pattern is designed to test if a line begins with "return" but only at the start of a line (not at all positions in the string).

I found an implementation. It works perfect. It indents label body and don't disturb content around. Here is the implementation: https://github.com/sl5net/SL5_AHK_Refactor_engine/blob/master/phpdesktop-msie-1.14-php-5.4.33/www/SL5_preg_contentFinder/examples/AutoHotKey/Reformatting_Autohotkey_Source.php#L192

$pattern = '/^(\w+:)(\h*\n)(?:.*\n)*?(return)/m';
$label = '^\w[\w\d_]*:';
$pattern = '/' . "($label)(\h*\R)((?:.*\n)*?)(return\b)" . '/im';
preg_match_all($pattern, $actual, $matches,PREG_OFFSET_CAPTURE);
$labelsAr = $matches[1];
$contentAr = $matches[3];
$returnAr =  $matches[4];
for($k = count($labelsAr) ; $k-- ; $k >=0 ) {
    $new = $labelsAr[$k][0]
      . "\n" . $indentStr
      . rtrim( preg_replace('/\n/ism', "\n"
        . $indentStr, $contentAr[$k][0]) )
      . "\n" . ltrim($returnAr[$k][0]) ;
    $actual = substr($actual,0,$labelsAr[$k][1])
      . $new
      . substr($actual,$returnAr[$k][1] + strlen($returnAr[$k][0]) ) ;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM