簡體   English   中英

PHP-如何刪除所有<script> and CDATA of HTML string with DOMDocument

[英]PHP - How to remove all <script> and CDATA of HTML string with DOMDocument

我應該刪除HTML字符串的<script>和CDATA內的所有內容。

我正在使用這樣的代碼:

$content = "
TEST1
<script type='text/javascript'>
/* <![CDATA[ */
var markers = [{'ID':3681,'post_author':'4'}]
/* ]]> */
</script>
TEST2
";

libxml_use_internal_errors(true);
$domDoc = new DOMDocument();
$domDoc->loadHTML($content);
libxml_clear_errors();

foreach($domDoc->getElementsByTagName('script') as $scripttag){
    $scripttag->parentNode->removeChild($scripttag);
}

但這是行不通的。 什么都沒有去除。

沒關系,如果我使用正則表達式

$re = '/<script\b[^>]*>.*?<\/script>/is';
$str = 'TEST1
<script type=\'text/javascript\'>
/* <![CDATA[ */
var markers = [{\'ID\':3681,\'post_author\':\'4\'}]
/* ]]> */
</script>
TEST2';

$content= preg_replace($re, '', $str, 1);

是否可以使用PHP DOMDocument而不是RegEx表達式刪除此類內容?

使用Hatef答案進行編輯

$content = "
<script type='text/javascript'>
/* <![CDATA[ */
var _cf7 = {'recaptcha':{'messages':{'empty':'Merci de confirmer que vous n\u2019\u00eates pas un robot.'}},'cached':'1'};
/* ]]> */
</script>
<script type='text/javascript' src='https://www.test.com/includes/js/scripts.js'></script>
<script type='text/javascript'>
/* <![CDATA[ */
var pollsL10n = {'ajax_url':'https:\/\/www.test.com\/ajax.php','text_wait':'Your last request is still being processed. Please wait a while ...','text_valid':'Please choose a valid poll answer.','text_multiple':'Maximum number of choices allowed:','show_loading':'1','show_fading':'1'};
/* ]]> */
</script>
<!--[if lt IE 8]>
<script type='text/javascript' src='https://www..test.com/json2.min.js'></script>
<![endif]--><script type='text/javascript'>
/* <![CDATA[ */
var ajaxurl = 'https:\/\/.test.com\/ajax.php';
/* ]]> */
</script>
<script type='text/javascript' src='https://www.test.com/slider.min.js?x40297'></script>
<script>
        (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
        m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
        })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

        ga('create', 'UA-37273722-1', 'auto');
        ga('send', 'pageview');
</script>
";

libxml_use_internal_errors(true);
$domDoc = new DOMDocument();
$domDoc->loadHTML($content);
libxml_clear_errors();

foreach($domDoc->getElementsByTagName('script') as $scripttag){
    $scripttag->parentNode->removeChild($scripttag);
}
$content = $domDoc->saveHTML();

$content包含

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script type="text/javascript" src="https://www.test.com/includes/js/scripts.js"></script><!--[if lt IE 8]>
<script type='text/javascript' src='https://www..test.com/json2.min.js'></script>
<![endif]--><script type="text/javascript">
/* <![CDATA[ */
var ajaxurl = 'https:\/\/.test.com\/ajax.php';
/* ]]> */
</script><script>
        (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
        m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
        })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

        ga('create', 'UA-37273722-1', 'auto');
        ga('send', 'pageview');
</script></head></html>

您的DOMDocument解決方案運行完美; 您只是缺少最后一行來實際保存HTML:

$content = $domDoc->saveHTML();

您可能已經知道,最好不要使用regex來解析HTML


這應該與您的新示例一起使用:

$scriptTags = $domDoc->getElementsByTagName('script');

while($scriptTags->length > 0){
    $scriptTag = $scriptTags->item(0);
    $scriptTag->parentNode->removeChild($scriptTag);
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM