[英]remove php short tags from html source
I am parse some html code with curl. 我正在解析一些带有curl的html代码。 some site's html source like: 一些网站的html来源,例如:
<div id="content">
some words
</div>
<?
$box_social['dimensioni']="80";
$box_vota=array();
$box_vota["novideo"]='';
$box_vota["nofoto"]='';
$box_vota["id_articolo"]='1003691';
include($_SERVER['DOCUMENT_ROOT']."/incs/box_social.php");
?>
<div id="footer">
some words
</div>
How to remove php short tags from html source? 如何从html源中删除php短标签? I need 我需要
<div id="content">
some words
</div>
<div id="footer">
some words
</div>
And I use preg_replace('/<\\?(.*?)\\?>/','',$html);
我使用preg_replace('/<\\?(.*?)\\?>/','',$html);
, but the php short tag part still there. ,但php短标记部分仍然存在。
This regex matches your case: 此正则表达式符合您的情况:
$html = htmlspecialchars(preg_replace('/<\?([\w\W]*)\?>/','',$html));
$html = htmlspecialchars(preg_replace('/<\?(.*)\?>/s','',$html));
This also matches if more than one block of PHP is there: 如果存在多个PHP块,这也将匹配:
$html = htmlspecialchars(preg_replace('/<\?([^\?>]*)\?>/','',$html));
s (PCRE_DOTALL) If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. s(PCRE_DOTALL)如果设置了此修饰符,则模式中的点元字符将匹配所有字符,包括换行符。 Without it, newlines are excluded. 没有它,换行符将被排除。 This modifier is equivalent to Perl's /s modifier. 此修饰符等效于Perl的/ s修饰符。 A negative class such as [^a] always matches a newline character, independent of the setting of this modifier. 否定类(例如[^ a])始终与换行符匹配,而与该修饰符的设置无关。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.