简体   繁体   English

php正则表达式,如果不在HTML标记中,则匹配字符串

[英]php regular expression to match string if NOT in an HTML tag

I'm trying to solve this bug in Drupal's Hashtags module: http://drupal.org/node/1718154 我正在尝试解决Drupal的Hashtags模块中的此错误: http ://drupal.org/node/1718154

I've got this function that matches every word in my text that is prefixed by "#", like #tag: 我有此功能,可匹配文本中以“#”为前缀的每个单词,例如#tag:

function hashtags_get_tags($text) {
    $tags_list = array();
    $pattern = "/#[0-9A-Za-z_]+/";
    preg_match_all($pattern, $text, $tags_list);
    $result = implode(',', $tags_list[0]);
    return $result;
    }

I need to ignore internal links in pages, such as <a href="#reference">link</a> , or, more in general, any word prefixed by # that appears inside an HTML tag (so preceeded by < and followed by >). 我需要忽略页面中的内部链接,例如<a href="#reference">link</a> ,或者更一般而言, 忽略 HTML标记中出现的带有#前缀的任何单词(因此以<和开头通过>)。

Any idea how can I achieve this? 知道我该如何实现吗?

Can you strip the tags first because matching (using the strip_tags function)? 是否可以因为匹配而首先剥离标签(使用strip_tags函数)?

function hashtags_get_tags($text) {

    $text = strip_tags($text);

    $tags_list = array();
    $pattern = "/#[0-9A-Za-z_]+/";
    preg_match_all($pattern, $text, $tags_list);
    $result = implode(',', $tags_list[0]);
    return $result;
}

A regular expression is going to be tricky if you want to only match hashtags that are not inside an HTML tag. 如果您只想匹配不在 HTML标记内的主题标记,则正则表达式将非常棘手。

You could throw out the tags before hand using preg_replace 您可以使用preg_replace事先丢弃标签

function hashtags_get_tags($text) {
$tags_list = array();
$pattern = "/#[0-9A-Za-z_]+/";
$text=preg_replace("/<[^>]*>/","",$text);
preg_match_all($pattern, $text, $tags_list);
$result = implode(',', $tags_list[0]);
return $result;
}

I made this function using PHP DOM . 我使用PHP DOM进行了此功能。

It returns all links that have # in the href . 它返回所有在href中带有#链接。

If you want it to only remove internal hash tags, replace this line: 如果希望它仅删除内部哈希标签,请替换此行:

if(strpos($link->getAttribute('href'), '#') === false) {

with this: 有了这个:

if(strpos($link->getAttribute('href'), '#') !== 0) {

This is the function: 这是功能:

function no_hashtags($text) {
    $doc = new DOMDocument();
    $doc->loadHTML($text);
    $links = $doc->getElementsByTagName('a');
    $nohashes = array();
    foreach($links as $link) {
        if(strpos($link->getAttribute('href'), '#') === false) {
            $temp = new DOMDocument();
            $elem = $temp->importNode($link->cloneNode(true), true);
            $temp->appendChild($elem);
            $nohashes[] = $temp->saveHTML();
        }
    }
    // return $nohashes;
    return implode('', $nohashes);
    // return implode(',', $nohashes);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM