简体   繁体   English

从HTML中删除具有特定类的范围,但不能仅使用正则表达式删除内容

[英]Removing a span with a specific class from HTML , but not the content using regular expression only

My php scripts creates following html. 我的PHP脚本创建了以下html。

<div>
    <hr class="target"/>
    Remove target class <span class="target"> only and save this text</span>
    <span class='target test1 test2 '> Remove target class with span tag not this text</span>
    <span class="target"> multi-line / multi-paragraph content</span>
    <span class='target'>content without space after span tag</span>
</div>

I want above html as follows using PHP regex expression only(as buiseness logic requirement). 我想只使用PHP regex表达式(作为buiseness逻辑要求)如下html所示。

<div>
    <hr/>
    Remove target class only and save this text
    Remove target class with span tag not this text
    multi-line / multi-paragraph content
    content without space after span tag
</div>

Note: (1) target class may wrap in single/double quotes. 注意:(1)目标类可以用单引号/双引号引起来。 4). 4)。 a span with multiple classes 具有多个类别的跨度

I used following regex in PHP. 我在PHP中使用了以下正则表达式。

$data = preg_replace('#<(\w+) class=["\']highlight["\']>(.*)<\/\1>#', '\2', $data);

It done most things but fails on following. 它完成了大多数事情,但后续失败。 1) hr tag. 1)hr标签。 2) leaves extra space when it removes span tag. 2)删除span标签时,会留出多余的空间。 3.) fails on multiline content. 3.)在多行内容上失败。

Thanx in advance. 提前感谢。

The way to do that is to use DOMDocument: 做到这一点的方法是使用DOMDocument:

$html=<<<'EOD'
<div>
    <hr class="target"/>
    Remove target class <span class="target"> only and save this text</span>
    <span class='target test1 test2 '> Remove target class with span tag not this text</span>
    <span class="target"> multi-line / multi-paragraph content</span>
    <span class='target'>content without space after span tag</span>
</div>
EOD;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xp = new DOMXPath($dom);

// get the node list of span nodes with "target" class
$spanNodeList = $xp->query('//span[contains(@class, "target")]');

foreach ($spanNodeList as $spanNode) {
    $spanNode->parentNode->replaceChild($spanNode->firstChild, $spanNode);
}

// get the list of hr nodes
// (here I don't use XPath, but it can be done in the same way)
$hrNodes = $dom->getElementsByTagName('hr');

foreach ($hrNodes as $hrNode) {
    if ($hrNode->hasAttribute('class') && $hrNode->getAttribute('class') === 'target')
        $hrNode->removeAttribute('class');
}
echo $dom->saveHTML();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从HTML中删除具有特定类的范围,但不能使用正则表达式删除内容 - Removing a span with a specific class from HTML , but not the content using regular expression 如何使用PHP正则表达式用单个span标签替换具有相同类名的多个span标签 - How to replace multiple span tags with same class name with a single span tag using PHP Regular Expression 使用PHP中的正则表达式从一开始就删除空白 - Removing white space from the beginning using regular expression in PHP 使用PHP中的正则表达式提取html内容中的粗体元素 - Extract bold element in html content using regular expression in PHP 增强php正则表达式以不包含来自特定html属性的值 - Enhance a php regular expression to not include values from specific html attributes span标签的正则表达式 - Regular expression for span tag 使用正则表达式从文本内容创建句子数组 - Using Regular Expression to create arrays of sentences from text content 如何在PHP中使用正则表达式(regex)从json提取内容 - How to take content from json using regular expression (regex) in php 如何使用带有正则表达式的PHP从嵌套标记中获取内容? - How to get content from nested mark using PHP with regular expression? 挣扎与正则表达式,从字符串中删除数字 - Struggling with regular expression, removing number from string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM