简体   繁体   中英

Find/Replace array() with Regular Expression

I'm trying to search though my code replacing all old style PHP array() s with the shorthand [] style. However, I'm having some trouble creating a working/reliable regex...

What I currently have: (^|[\\s])array\\((['"](\\s\\S)['"]|[^)])*\\) (View on Regex101 )

// Match All
array('array()')

array('key' => 'value');
array(
    'key'  => 'value',
    'key2' => '(value2)'
);
    array()
  array()
array()

// Match Specific Parts
function (array $var = array()) {}
$this->in_array(array('something', 'something'));

// Don't match
toArray()
array_merge()
in_array();

I've created a Regex101 for it...

EDIT: This isn't the answer to the question, but one alternative is to use PHPStorm's Traditional syntax array literal detected inspection...

How to:

  • Open the Code menu
  • Click Run inspection by name... (Ctrl + Alt + Shift + I)
  • Type Traditional syntax array literal detected
  • Press <Enter>
  • Specify the scope you wish to run it on
  • Press <Enter>
  • Review/Apply the changes in the Inspection window.

It is possible but not trivial since you need to fully describe two parts of the PHP syntax (that are strings and comments) to prevent parenthesis to be interpreted inside them. Here is a way to do it with PHP itself:

$pattern = <<<'EOD'
~
(?(DEFINE)
    (?<quotes> (["']) (?: [^"'\\]+ | \\. | (?!\g{-1})["'] )*+ (?:\g{-1}|\z) )
    (?<heredoc> <<< (["']?) ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*) \g{-2}\R
                (?>\N*\R)*?
                (?:\g{-1} ;? (?:\R | \z) | \N*\z)
    )
    (?<string> \g<quotes> | \g<heredoc> )

    (?<inlinecom> (?:// |\# ) \N* $ )
    (?<multicom> /\*+ (?:[^*]+|\*+(?!/))*+ (?:\*/|\z))
    (?<com> \g<multicom> | \g<inlinecom> )

    (?<nestedpar> \( (?: [^()"'<]+ | \g<com> | \g<string> | < | \g<nestedpar>)*+ \) )
)

(?:\g<com> | \g<string> ) (*SKIP)(*FAIL)
|
(?<![-$])\barray\s*\( ((?:[^"'()/\#]+|\g<com>|/|\g<string>|\g<nestedpar>)*+) \)
~xsm
EOD;

do {
    $code = preg_replace($pattern, '[${11}]', $code, -1, $count);
} while ($count);

The pattern contains two parts, the first is a definition part and the second is the main pattern.

The definition part is enclosed between (?(DEFINE)...) and contains named subpattern definitions for different useful elements (in particular "string" "com" and "nestedpar"). These subpatterns would be used later in the main pattern.

The idea is to never search a parenthese inside a comment, a string or among nested parentheses.

The first line: (?:\\g<com> | \\g<string> ) (*SKIP)(*FAIL) will skip all comments and strings until the next array declaration (or until the end of the string).

The last line describes the array declaration itself, details:

(?<![-$])\b        # check if "array" is not a part of a variable or function name
array \s*\(
(                   # capture group 11
    (?:             # describe the possible content
        [^"'()/\#]+ # all that is not a quote, a round bracket, a slash, a sharp
      |             # OR
        \g<com>     # a comment
      |
        /           # a slash that is not a part of a comment
      |
        \g<string>  # a string
      |
        \g<nestedpar> # nested round brackets
    )*+
)
\)

pattern demo

code demo

about nested array declarations:

The present pattern is only able to find the outermost array declaration when a block of nested array declarations is found.

The do...while loop is used to deal with nested array declarations, because it is not possible to perform a replacement of several nesting level in one pass (however, there is a way with preg_replace_callback but it isn't very handy). To stop the loop, the last parameter of preg_replace is used. This parameter contains the number of replacements performed in the target string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM