简体   繁体   中英

PHP Regular Expression to grab values enclosed in double quotes

This question is related to RegEx: Grabbing values between quotation marks , that I've tried to implement in my actual code, but with no success.

What I'd like to accomplish is to parse PHP code , and grab literal double-quoted strings inside the code.

Solutions using token_get_all() are not valid, as the PHP code may be not parsing correctly (invalid, broken, old PHP 4 code).

The regular expression should:

  1. Match only if a double-quote is not preceeded by a single quote
  2. Match only if a double-quote is not followed by a single quote
  3. Also match backslashes inside the double-quoted string
  4. Leave the start and trailing double quoted untouched (return it as part of the match)

To have an example of what the regexp should match, consider this parts of (ugly, old and unsecure) PHP code:

header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
$sql = "UPDATE $table_name SET
password = password('$newpass'), pchange = '1'
WHERE email = '$email'";
$var = '"' . $something . '"';
$msg = "<p><a href=\"login.html\">Login</a></p>";
echo "<label for=\"whatever\">LABEL</label><select class='".$style."'>";

The regular expression should match:

  1. "Last-Modified: "
  2. "D, d MYH:i:s"
  3. " GMT"
  4. "UPDATE $table_name SET password = password('$newpass'), pchange = '1' WHERE email = '$email'"
  5. "<p><a href=\"login.html\">Login</a></p>"
  6. "<label for=\"whatever\">LABEL</label><select class='"
  7. "'>"

The regexp will be used within a preg_match() with PREG_OFFSET_CAPTURE , to restart the search where the last match occurred, in this way:

$string_match = preg_match(**REGEXP_HERE**, $php_code, $text_in_double_quotes, PREG_OFFSET_CAPTURE, $last_pos);
if ($string_match) {
    list($text_in_double_quotes, $last_pos) = $text_in_double_quotes[0];
}

Thank you!

PS

For those asking why I'm bothering doing this, is to match unquoted array accesses inside these literal double-quoted strings and have them corrected.

For example (don't use this code, it has severe security flaws):

$sql = "SELECT * FROM table1 WHERE userid = '$_SESSION[id]'";
$sql2 = "SELECT * FROM table2 WHERE userid = '$array[key]' AND id = ".$other_array[whatever];

Will get transformed in

$sql = "SELECT * FROM table1 WHERE userid = '" . $_SESSION['id'] . "'";
$sql2 = "SELECT * FROM table2 WHERE userid = '" . $array['key'] . "' AND id = " . $other_array['whatever'];

You could use verbs (*SKIP)(*F) to exclude single quoted substrings.

$regex = '/\'[^\'\\\]*(?:\\\.[^\'\\\]*)*\'(*SKIP)(?!)|"[^"\\\]*(?:\\\.[^"\\\]*)*"/';

See this demo at regex101 - The underlying pattern is from this answer .
To extract multiple items, use this regex with preg_match_all like that:

if(preg_match_all($regex, $str, $out) > 0) {
  print_r($out[0]);    
}

Here is a PHP demo at tio.run , matches will be in $out[0] (full pattern).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM