简体   繁体   English

使用PHP DOMXpath和通配符时排除HTML属性

[英]Exclude HTML attributes when using PHP DOMXpath and wildcards

I'm trying match multiple strings on a Joomla site using PHP DOMXpath with a query like: 我正在尝试使用PHP DOMXpath与以下查询匹配Joomla网站上的多个字符串:

$query = "//*[contains(text(),'$target'))]";

An example of the HTML markup is along the lines of: HTML标记的示例如下:

<ul>
  <li>
    <a href="#" title="foo bar"><span>foo bar</span></a>
 </li>
</ul>

The entirety of the PHP function (modified for clarity) is: PHP函数的整体(为清楚起见进行了修改)为:

function onAfterRender() {

    $buffer = JResponse::getBody();

    $doc = new DOMDocument;
    $doc->loadHTML($buffer);
    $xpath = new DOMXPath($doc);

    $targets = 'Foo, foo';
    $targets = explode(',', $targets);

     foreach ($targets as $target) {

         $query = $xpath->evaluate("//*[contains(.,'" . trim($target) . "')]");

         foreach($query as $match) {

            $match = $doc->saveXML($match);

            $replacement = preg_replace("/($target)/i",'<i class="notranslate">' . $target. '</i>',$match);

            $buffer = str_replace($match, $replacement, $buffer);

            JResponse::setBody($buffer);
        }

     }

    return true;
}

Any ideas? 有任何想法吗?

Thanks! 谢谢!

EDIT: The issue, which I didn't clearly state before, is that when using this method to inject HTML, like foo, will generate invalid markup. 编辑:问题,我之前没有明确指出,是使用此方法注入HTML(如foo)时会生成无效的标记。 This invalid markup can render poorly, if not "broken" to the visitor. 这种无效的标记即使没有“破坏”访问者,也可能导致效果不佳。 I'd like to exclude matching the title attribute and possibly other elements such as the title tag, etc. 我想排除匹配title属性以及可能的其他元素,例如title标签等。

EDIT: I've updated the original question and code. 编辑:我已经更新了原始问题和代码。 Part of the solution was changing $match = $doc->saveXML($match); 解决方案的一部分是更改$ match = $ doc-> saveXML($ match); as that retains the HTML markup. 因为它保留了HTML标记。 However, I am unable to exclude the HTML attributes, but can omit those matches with a further regex. 但是,我无法排除HTML属性,但是可以使用其他正则表达式省略那些匹配项。

Missing equal sign in title="foo" title =“ foo”中缺少等号

<ul>
  <li>
    <a href="#" title="foo"><span>fooey</span></a>
 </li>
</ul>

This seems to work for me: 这似乎为我工作:

    $body = JResponse::getBody();
    // test
    $doc = new DOMDocument;
    $doc->loadHTML($body);
    $xpath = new DOMXPath($doc);
    $targets = 'Foo, foo';
    $targets = explode(',', $targets);

     foreach ($targets as $target) {

         $query = "//*[contains(text(),'".trim($target)."')]";
         echo $query .'<br>';

         foreach($xpath->query($query) as $match) {

            $match = $match->textContent;
            echo 'match: ' . $match .'<br>';

        }

     }

Outputs: 输出:

//*[contains(text(),'Foo')]
//*[contains(text(),'foo')]
match: fooey

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM