[英]  removal in PHP
I need to remove all dodgy html characters from a web-site I'm parsing using Curl and simplehtml dom. 我需要从我正在使用Curl和simplehtml dom解析的网站中删除所有狡猾的html字符。
<?php
$html = "this is a text";
var_dump($html);
var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));
Which outputs 哪个输出
string(19) "this is a text"
string(19)“这是一个文本”
string(15) "this is a text"
string(15)“这是一个文本”
I don't want to use preg* as there are other characters in the text (eg °). 我不想使用preg *,因为文本中还有其他字符(例如&deg)。 This is driving me insane now!
这让我疯了!
Thanks, James 谢谢,詹姆斯
You need to specify your output encoding with a header: 您需要使用标头指定输出编码:
<?php
header('Content-Type: text/html; charset=utf-8');
$html = "this is a text";
var_dump($html);
var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));
?>
The browser does not assume UTF-8 by default, that's why it displays the wrong character. 默认情况下,浏览器不会采用UTF-8,这就是显示错误字符的原因。
If that's the only character that needs replacing just use str_replace()
如果这是唯一需要替换的字符,请使用
str_replace()
var_dump(str_replace(' ', ' ', "this is a text"));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.