简体   繁体   English

正则表达式匹配php标签内没有变量的双引号字符串

[英]Regex to match double quoted strings without variables inside php tags

Basically I need a regex expression to match all double quoted strings inside PHP tags without a variable inside. 基本上我需要一个正则表达式来匹配PHP标签内的所有双引号字符串,而不包含变量。

Here's what I have so far: 这是我到目前为止所拥有的:

"([^\$\n\r]*?)"(?![\w ]*')

and replace with: 并替换为:

'$1'

However, this would match things outside PHP tags as well, eg HTML attributes. 但是,这也会匹配PHP标记之外的内容,例如HTML属性。

Example case: 示例案例:

<a href="somelink" attribute="value">Here's my "dog's website"</a>
<?php
    $somevar = "someval";
    $somevar2 = "someval's got a quote inside";
?>
<?php
    $somevar3 = "someval with a $var inside";
    $somevar4 = "someval " . $var . 'with concatenated' . $variables . "inside";
    $somevar5 = "this php tag doesn't close, as it's the end of the file...";

it should match and replace all places where the " should be replaced with a ' , this means that html attributes should ideally be left alone. 它应该匹配并替换"应该用'替换'所有地方,这意味着理想情况下应该保留html属性。

Example output after replace: 替换后的输出示例:

<a href="somelink" attribute="value">Here's my "dog's website"</a>
<?php
    $somevar = 'someval';
    $somevar2 = 'someval\'s got a quote inside';
?>
<?php
    $somevar3 = "someval with a $var inside";
    $somevar4 = 'someval ' . $var . 'with concatenated' . $variables . 'inside';
    $somevar5 = 'this php tag doesn\'t close, as it\'s the end of the file...';

It would also be great to be able to match inside script tags too...but that might be pushing it for one regex replace. 能够匹配内部脚本标签也很棒......但是这可能会推动它进行一次正则表达式替换。

I need a regex approach, not a PHP approach. 我需要一个正则表达式方法,而不是PHP方法。 Let's say I'm using regex-replace in a text editor or JavaScript to clean up the PHP source code. 假设我在文本编辑器或JavaScript中使用regex-replace来清理PHP源代码。

tl;dr TL;博士

This is really too complex complex to be done with regex. 对于正则表达式来说,这实在是太复杂了。 Especially not a simple regex. 特别是不是一个简单的正则表达式。 You might have better luck with nested regex, but you really need to lex/parse to find your strings, and then you could operate on them with a regex. 你可能有更好的运气嵌套正则表达式,但你真的需要lex / parse来找到你的字符串, 然后你可以使用正则表达式对它们进行操作。

Explanation 说明

You can probably manage to do this. 也许可以设法做到这一点。 You can probably even manage to do this well, maybe even perfectly . 你可以甚至可能设法做好这一点,甚至完美 But it's not going to be easy. 但这并不容易。 It's going to be very very difficult. 这将非常困难。

Consider this: 考虑一下:

Welcome to my php file. We're not "in" yet.

<?php
  /* Ok. now we're "in" php. */

  echo "this is \"stringa\"";
  $string = 'this is \"stringb\"';
  echo "$string";
  echo "\$string";

  echo "this is still ?> php.";

  /* This is also still ?> php. */

?> We're back <?="out"?> of php. <?php

  // Here we are again, "in" php.

  echo <<<STRING
    How do "you" want to \""deal"\" with this STRING;
STRING;

  echo <<<'STRING'
    Apparently this is \\"Nowdoc\\". I've never used it.
STRING;

  echo "And what about \\" . "this? Was that a tricky '\"' to catch?";

  // etc...

Forget matching variable names in double quoted strings. 忘记在双引号字符串中匹配变量名。 Can you just match all of the string in this example? 你可以匹配这个例子中的所有字符串吗? It looks like a nightmare to me. 对我来说,这看起来像是一场噩梦。 SO's syntax highlighting certainly won't know what to do with it. SO的语法突出显然肯定不知道如何处理它。

Did you consider that variables may appear in heredoc strings as well? 您是否认为变量可能也出现在heredoc字符串中?

I don't want to think about the regex to check if: 我不想考虑正则表达式检查是否:

  1. Inside <?php or <?= code <?php<?=代码中
  2. Not in a comment 不在评论中
  3. Inside a quoted quote 在引用的报价内
  4. What type of quoted quote? 什么类型的报价?
  5. Is it a quote of that type? 这是那种类型的引用吗?
  6. Is it preceded by \\ (escaped)? 它前面是\\ (转义)?
  7. Is the \\ escaped?? \\逃脱?
  8. etc... 等等...

Summary 摘要

You can probably write a regex for this. 你可以为此写一个正则表达式。 You can probably manage with some backreferences and lots of time and care. 你可以用一些反向引用和大量的时间和关心来管理。 It's going to be hard and your probably going to waste a lot of time, and if you ever need to fix it , you aren't going to understand the regex you wrote. 它会很难,你可能会浪费很多时间,如果你需要修复它 ,你就不会理解你写的正则表达式了。

See also 也可以看看

This answer . 这个答案 It's worth it. 这很值得。

Here's a function that utilizes the tokenizer extension to apply preg_replace to PHP strings only: 这是一个利用tokenizer扩展仅将preg_replace应用于PHP字符串的函数:

function preg_replace_php_string($pattern, $replacement, $source) {
    $replaced = '';
    foreach (token_get_all($source) as $token) {
        if (is_string($token)){
            $replaced .= $token;
            continue;
        }
        list($id, $text) = $token;
        if ($id === T_CONSTANT_ENCAPSED_STRING) {
            $replaced .= preg_replace($pattern, $replacement, $text);
        } else {
            $replaced .= $text;
        }
    }
    return $replaced;
}

In order to achieve what you want, you can call it like this: 为了达到你想要的效果,你可以这样称呼它:

<?php
    $filepath = "script.php";
    $file = file_get_contents($filepath);
    $replaced = preg_replace_php_string('/^"([^$\{\n<>\']+?)"$/', '\'$1\'', $file);
    echo $replaced;

The regular expression that's passed as the first argument is the key here. 作为第一个参数传递的正则表达式是此处的键。 It tells the function to only transform strings to their single-quoted equivalents if they do not contain $ (embedded variable "$a" ) , { (embedded variable type 2 "{$a[0]}" ) , a new line, < or > (HTML tag end/open symbols). 它告诉函数只将字符串转换为单引号等价物,如果它们不包含$ (嵌入变量"$a"{ (嵌入变量类型2 "{$a[0]}" ,新行, <> (HTML标记结束/打开符号)。 It also checks if the string contains a single-quote, and prevents the replacement to avoid situations where it would need to be escaped. 它还检查字符串是否包含单引号,并防止替换以避免需要转义的情况。

While this is a PHP solution, it's the most accurate one. 虽然这是一个PHP解决方案,但它是最准确的。 The closest you can get with any other language would require you to build your own PHP parser in that language to some degree in order for your solution to be accurate. 您可以使用任何其他语言获得的最接近的语言要求您在某种程度上使用该语言构建自己的PHP解析器,以使您的解决方案准确无误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式:如何匹配转义的双引号字符串? - regex: How to match escaped double quoted strings? 正则表达式:匹配非转义的双引号字符串 - RegEx: Match Non-Escaped Double Quoted Strings PHP正则表达式,用于匹配字符串中的双引号和/或单引号字符串 - PHP regex for matching double and/or single quoted strings within in a string 有关在PHP中将变量嵌入双引号字符串中的问题 - Question regarding embedding variables in double-quoted strings in PHP 如何正确地转义反斜杠以匹配单引号和双引号PHP正则表达式模式中的文字反斜杠 - How to properly escape a backslash to match a literal backslash in single-quoted and double-quoted PHP regex patterns PHP用双引号引起来的字符串转义字符 - PHP Double quoted strings escaped characters 用双引号字符串解析php变量 - php variable parsing in double quoted strings PHP-带双引号字符串的正则表达式 - PHP - Regular Expressions with Double Quoted Strings PHP 中的单引号和双引号字符串有什么区别? - What are difference between single quoted and double quoted strings in PHP? 哪个在性能上更好:带变量的双引号字符串或带连接的单引号字符串? - Which is better on performance: double quoted strings with variables or single quoted strings with concatenations?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM