简体   繁体   English

使用正则表达式突出显示PHP中的搜索词,而不会破坏锚标记

[英]Highlight Search Terms in PHP without breaking anchor tags using regex

I'm searching through some database search results on a website & trying to highlight the term in the returned results that matches the searched term. 我正在网站上搜索一些数据库搜索结果,并试图在返回的结果中突出显示与搜索到的词相匹配的词。 Below is what I have so far (in php): 下面是我到目前为止(在php中):

$highlight = trim($highlight);
if(preg_match('|\b(' . $highlight . ')\b|i', $str_content))
{
    $str_content = preg_replace('|\b(' . $highlight. ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", 
    $str_break;
}

The downside of going this route is that if my search term shows up in the url permalink as well, the returned result will insert the span into the href attribute and break the anchor tag. 这样做的缺点是,如果我的搜索字词也显示在url永久链接中,则返回的结果会将跨度插入href属性,并破坏定位标记。 Is there anyway in my regex to exclude "any" information from the search results that appear in between an opening and closing HTML tag? 无论如何,我的正则表达式中是否有将“任何”信息从出现在开始和结束HTML标记之间的搜索结果中排除的信息?

I know I could use the strip_tags() function and just spit out the results in plain text, but I'd rather not do that if I didn't have to. 我知道我可以使用strip_tags()函数并将结果以纯文本格式吐出,但是如果不需要的话,我宁愿不这样做。

DO NOT try to parse HTML with regular expressions: 不要尝试使用正则表达式解析HTML:
RegEx match open tags except XHTML self-contained tags RegEx匹配XHTML自包含标签以外的打开标签

Try something like PHP Simple HTML DOM . 试试类似PHP Simple HTML DOM的东西。

<?php
// get DOM
$html = file_get_html('http://www.google.com/search?q=hello+kitty');

// ensure this is properly sanitized.
$term = trim($term);

// highlight $term in all <div class="result">...</div> elements
foreach($html->find('div.result') as $e){
   echo str_replace($term, '<span class="highlight">'.$term.'</span>', $e->plaintext);
}
?>

Note: this is not an exact solution because I don't know what your HTML looks like, but this should put you pretty close to being on track. 注意:这不是一个精确的解决方案,因为我不知道您的HTML外观如何,但这应该使您几乎步入正轨。

我认为断言是您要寻找的。

I ended up going this route, which so far, works well for this specific situation. 我最终选择了这条路线,到目前为止,这种路线在这种情况下效果很好。

<?php

if(preg_match('|\b(' . $term . ')\b|i', $str_content))
{
    $str_content = strip_tags($str_content);
    $str_content = preg_replace('|\b(' . $term . ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", $str_content);
    $str_content = preg_replace('|\n[^<]+|', '</p><p>', $str_content);
    break;
}

?>

It's still html encoded, but it's easier to parse through now without html tags 它仍然是html编码的,但是现在无需html标签就更容易解析

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM