简体   繁体   中英

Regex to match double quoted strings without variables inside php tags

Basically I need a regex expression to match all double quoted strings inside PHP tags without a variable inside.

Here's what I have so far:

"([^\$\n\r]*?)"(?![\w ]*')

and replace with:

'$1'

However, this would match things outside PHP tags as well, eg HTML attributes.

Example case:

<a href="somelink" attribute="value">Here's my "dog's website"</a>
<?php
    $somevar = "someval";
    $somevar2 = "someval's got a quote inside";
?>
<?php
    $somevar3 = "someval with a $var inside";
    $somevar4 = "someval " . $var . 'with concatenated' . $variables . "inside";
    $somevar5 = "this php tag doesn't close, as it's the end of the file...";

it should match and replace all places where the " should be replaced with a ' , this means that html attributes should ideally be left alone.

Example output after replace:

<a href="somelink" attribute="value">Here's my "dog's website"</a>
<?php
    $somevar = 'someval';
    $somevar2 = 'someval\'s got a quote inside';
?>
<?php
    $somevar3 = "someval with a $var inside";
    $somevar4 = 'someval ' . $var . 'with concatenated' . $variables . 'inside';
    $somevar5 = 'this php tag doesn\'t close, as it\'s the end of the file...';

It would also be great to be able to match inside script tags too...but that might be pushing it for one regex replace.

I need a regex approach, not a PHP approach. Let's say I'm using regex-replace in a text editor or JavaScript to clean up the PHP source code.

tl;dr

This is really too complex complex to be done with regex. Especially not a simple regex. You might have better luck with nested regex, but you really need to lex/parse to find your strings, and then you could operate on them with a regex.

Explanation

You can probably manage to do this. You can probably even manage to do this well, maybe even perfectly . But it's not going to be easy. It's going to be very very difficult.

Consider this:

Welcome to my php file. We're not "in" yet.

<?php
  /* Ok. now we're "in" php. */

  echo "this is \"stringa\"";
  $string = 'this is \"stringb\"';
  echo "$string";
  echo "\$string";

  echo "this is still ?> php.";

  /* This is also still ?> php. */

?> We're back <?="out"?> of php. <?php

  // Here we are again, "in" php.

  echo <<<STRING
    How do "you" want to \""deal"\" with this STRING;
STRING;

  echo <<<'STRING'
    Apparently this is \\"Nowdoc\\". I've never used it.
STRING;

  echo "And what about \\" . "this? Was that a tricky '\"' to catch?";

  // etc...

Forget matching variable names in double quoted strings. Can you just match all of the string in this example? It looks like a nightmare to me. SO's syntax highlighting certainly won't know what to do with it.

Did you consider that variables may appear in heredoc strings as well?

I don't want to think about the regex to check if:

  1. Inside <?php or <?= code
  2. Not in a comment
  3. Inside a quoted quote
  4. What type of quoted quote?
  5. Is it a quote of that type?
  6. Is it preceded by \\ (escaped)?
  7. Is the \\ escaped??
  8. etc...

Summary

You can probably write a regex for this. You can probably manage with some backreferences and lots of time and care. It's going to be hard and your probably going to waste a lot of time, and if you ever need to fix it , you aren't going to understand the regex you wrote.

See also

This answer . It's worth it.

Here's a function that utilizes the tokenizer extension to apply preg_replace to PHP strings only:

function preg_replace_php_string($pattern, $replacement, $source) {
    $replaced = '';
    foreach (token_get_all($source) as $token) {
        if (is_string($token)){
            $replaced .= $token;
            continue;
        }
        list($id, $text) = $token;
        if ($id === T_CONSTANT_ENCAPSED_STRING) {
            $replaced .= preg_replace($pattern, $replacement, $text);
        } else {
            $replaced .= $text;
        }
    }
    return $replaced;
}

In order to achieve what you want, you can call it like this:

<?php
    $filepath = "script.php";
    $file = file_get_contents($filepath);
    $replaced = preg_replace_php_string('/^"([^$\{\n<>\']+?)"$/', '\'$1\'', $file);
    echo $replaced;

The regular expression that's passed as the first argument is the key here. It tells the function to only transform strings to their single-quoted equivalents if they do not contain $ (embedded variable "$a" ) , { (embedded variable type 2 "{$a[0]}" ) , a new line, < or > (HTML tag end/open symbols). It also checks if the string contains a single-quote, and prevents the replacement to avoid situations where it would need to be escaped.

While this is a PHP solution, it's the most accurate one. The closest you can get with any other language would require you to build your own PHP parser in that language to some degree in order for your solution to be accurate.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM