简体   繁体   中英

Exclude HTML attributes when using PHP DOMXpath and wildcards

I'm trying match multiple strings on a Joomla site using PHP DOMXpath with a query like:

$query = "//*[contains(text(),'$target'))]";

An example of the HTML markup is along the lines of:

<ul>
  <li>
    <a href="#" title="foo bar"><span>foo bar</span></a>
 </li>
</ul>

The entirety of the PHP function (modified for clarity) is:

function onAfterRender() {

    $buffer = JResponse::getBody();

    $doc = new DOMDocument;
    $doc->loadHTML($buffer);
    $xpath = new DOMXPath($doc);

    $targets = 'Foo, foo';
    $targets = explode(',', $targets);

     foreach ($targets as $target) {

         $query = $xpath->evaluate("//*[contains(.,'" . trim($target) . "')]");

         foreach($query as $match) {

            $match = $doc->saveXML($match);

            $replacement = preg_replace("/($target)/i",'<i class="notranslate">' . $target. '</i>',$match);

            $buffer = str_replace($match, $replacement, $buffer);

            JResponse::setBody($buffer);
        }

     }

    return true;
}

Any ideas?

Thanks!

EDIT: The issue, which I didn't clearly state before, is that when using this method to inject HTML, like foo, will generate invalid markup. This invalid markup can render poorly, if not "broken" to the visitor. I'd like to exclude matching the title attribute and possibly other elements such as the title tag, etc.

EDIT: I've updated the original question and code. Part of the solution was changing $match = $doc->saveXML($match); as that retains the HTML markup. However, I am unable to exclude the HTML attributes, but can omit those matches with a further regex.

Missing equal sign in title="foo"

<ul>
  <li>
    <a href="#" title="foo"><span>fooey</span></a>
 </li>
</ul>

This seems to work for me:

    $body = JResponse::getBody();
    // test
    $doc = new DOMDocument;
    $doc->loadHTML($body);
    $xpath = new DOMXPath($doc);
    $targets = 'Foo, foo';
    $targets = explode(',', $targets);

     foreach ($targets as $target) {

         $query = "//*[contains(text(),'".trim($target)."')]";
         echo $query .'<br>';

         foreach($xpath->query($query) as $match) {

            $match = $match->textContent;
            echo 'match: ' . $match .'<br>';

        }

     }

Outputs:

//*[contains(text(),'Foo')]
//*[contains(text(),'foo')]
match: fooey

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM