简体   繁体   English

PHP 正则表达式获取双引号括起来的值

[英]PHP Regular Expression to grab values enclosed in double quotes

This question is related to RegEx: Grabbing values between quotation marks , that I've tried to implement in my actual code, but with no success.这个问题与RegEx: Grabbing values between quotation marks有关,我试图在我的实际代码中实现,但没有成功。

What I'd like to accomplish is to parse PHP code , and grab literal double-quoted strings inside the code.我想要完成的是解析 PHP 代码,并在代码中获取文字双引号字符串。

Solutions using token_get_all() are not valid, as the PHP code may be not parsing correctly (invalid, broken, old PHP 4 code).使用token_get_all()的解决方案无效,因为 PHP 代码可能无法正确解析(无效、损坏、旧 PHP 4 代码)。

The regular expression should:正则表达式应该:

  1. Match only if a double-quote is not preceeded by a single quote仅当双引号前面没有单引号时才匹配
  2. Match only if a double-quote is not followed by a single quote仅当双引号后面没有跟单引号时才匹配
  3. Also match backslashes inside the double-quoted string还匹配双引号字符串中的反斜杠
  4. Leave the start and trailing double quoted untouched (return it as part of the match)保留开始和尾随双引号不变(将其作为匹配的一部分返回)

To have an example of what the regexp should match, consider this parts of (ugly, old and unsecure) PHP code:要举例说明正则表达式应该匹配什么,请考虑(丑陋、陈旧和不安全的)PHP 代码的这一部分:

header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
$sql = "UPDATE $table_name SET
password = password('$newpass'), pchange = '1'
WHERE email = '$email'";
$var = '"' . $something . '"';
$msg = "<p><a href=\"login.html\">Login</a></p>";
echo "<label for=\"whatever\">LABEL</label><select class='".$style."'>";

The regular expression should match:正则表达式应匹配:

  1. "Last-Modified: "
  2. "D, d MYH:i:s"
  3. " GMT"
  4. "UPDATE $table_name SET password = password('$newpass'), pchange = '1' WHERE email = '$email'"
  5. "<p><a href=\"login.html\">Login</a></p>"
  6. "<label for=\"whatever\">LABEL</label><select class='"
  7. "'>"

The regexp will be used within a preg_match() with PREG_OFFSET_CAPTURE , to restart the search where the last match occurred, in this way:正则表达式将在带有PREG_OFFSET_CAPTUREpreg_match()中使用,以通过这种方式重新开始搜索最后一次匹配发生的位置:

$string_match = preg_match(**REGEXP_HERE**, $php_code, $text_in_double_quotes, PREG_OFFSET_CAPTURE, $last_pos);
if ($string_match) {
    list($text_in_double_quotes, $last_pos) = $text_in_double_quotes[0];
}

Thank you!谢谢!

PS聚苯乙烯

For those asking why I'm bothering doing this, is to match unquoted array accesses inside these literal double-quoted strings and have them corrected.对于那些问我为什么要这样做的人,是在这些文字双引号字符串中匹配不带引号的数组访问并更正它们。

For example (don't use this code, it has severe security flaws):例如(不要使用这段代码,它有严重的安全漏洞):

$sql = "SELECT * FROM table1 WHERE userid = '$_SESSION[id]'";
$sql2 = "SELECT * FROM table2 WHERE userid = '$array[key]' AND id = ".$other_array[whatever];

Will get transformed in会变身

$sql = "SELECT * FROM table1 WHERE userid = '" . $_SESSION['id'] . "'";
$sql2 = "SELECT * FROM table2 WHERE userid = '" . $array['key'] . "' AND id = " . $other_array['whatever'];

You could use verbs (*SKIP)(*F) to exclude single quoted substrings.您可以使用动词(*SKIP)(*F)来排除单引号子字符串。

$regex = '/\'[^\'\\\]*(?:\\\.[^\'\\\]*)*\'(*SKIP)(?!)|"[^"\\\]*(?:\\\.[^"\\\]*)*"/';

See this demo at regex101 - The underlying pattern is from this answer .请参阅 regex101 上的演示- 基础模式来自此答案
To extract multiple items, use this regex with preg_match_all like that:要提取多个项目,请将此正则表达式与preg_match_all一起使用,如下所示:

if(preg_match_all($regex, $str, $out) > 0) {
  print_r($out[0]);    
}

Here is a PHP demo at tio.run , matches will be in $out[0] (full pattern). 这是 tio.run 上的 PHP 演示,匹配项将在$out[0] (完整模式)中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM