[英]Removing a span with a specific class from HTML , but not the content using regular expression only
My php scripts creates following html. 我的PHP脚本创建了以下html。
<div>
<hr class="target"/>
Remove target class <span class="target"> only and save this text</span>
<span class='target test1 test2 '> Remove target class with span tag not this text</span>
<span class="target"> multi-line / multi-paragraph content</span>
<span class='target'>content without space after span tag</span>
</div>
I want above html as follows using PHP regex expression only(as buiseness logic requirement). 我想只使用PHP regex表达式(作为buiseness逻辑要求)如下html所示。
<div>
<hr/>
Remove target class only and save this text
Remove target class with span tag not this text
multi-line / multi-paragraph content
content without space after span tag
</div>
Note: (1) target class may wrap in single/double quotes. 注意:(1)目标类可以用单引号/双引号引起来。 4). 4)。 a span with multiple classes 具有多个类别的跨度
I used following regex in PHP. 我在PHP中使用了以下正则表达式。
$data = preg_replace('#<(\w+) class=["\']highlight["\']>(.*)<\/\1>#', '\2', $data);
It done most things but fails on following. 它完成了大多数事情,但后续失败。 1) hr tag. 1)hr标签。 2) leaves extra space when it removes span tag. 2)删除span标签时,会留出多余的空间。 3.) fails on multiline content. 3.)在多行内容上失败。
Thanx in advance. 提前感谢。
The way to do that is to use DOMDocument: 做到这一点的方法是使用DOMDocument:
$html=<<<'EOD'
<div>
<hr class="target"/>
Remove target class <span class="target"> only and save this text</span>
<span class='target test1 test2 '> Remove target class with span tag not this text</span>
<span class="target"> multi-line / multi-paragraph content</span>
<span class='target'>content without space after span tag</span>
</div>
EOD;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xp = new DOMXPath($dom);
// get the node list of span nodes with "target" class
$spanNodeList = $xp->query('//span[contains(@class, "target")]');
foreach ($spanNodeList as $spanNode) {
$spanNode->parentNode->replaceChild($spanNode->firstChild, $spanNode);
}
// get the list of hr nodes
// (here I don't use XPath, but it can be done in the same way)
$hrNodes = $dom->getElementsByTagName('hr');
foreach ($hrNodes as $hrNode) {
if ($hrNode->hasAttribute('class') && $hrNode->getAttribute('class') === 'target')
$hrNode->removeAttribute('class');
}
echo $dom->saveHTML();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.