在PHP中删除

Question

I need to remove all dodgy html characters from a web-site I'm parsing using Curl and simplehtml dom. 我需要从我正在使用Curl和simplehtml dom解析的网站中删除所有狡猾的html字符。

<?php
$html = "this is&nbsp;a text";
var_dump($html);
var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));

Which outputs 哪个输出

string(19) "this is a text" string（19）“这是一个文本”

string(15) "this is┬áa text" string（15）“这是一个文本”

I don't want to use preg* as there are other characters in the text (eg &deg). 我不想使用preg *，因为文本中还有其他字符（例如＆deg）。 This is driving me insane now! 这让我疯了！

Thanks, James 谢谢，詹姆斯

Answer 1

You need to specify your output encoding with a header: 您需要使用标头指定输出编码：

<?php
    header('Content-Type: text/html; charset=utf-8');

    $html = "this is&nbsp;a text";
    var_dump($html);
    var_dump(html_entity_decode($html,ENT_COMPAT,"UTF-8"));
?>

The browser does not assume UTF-8 by default, that's why it displays the wrong character. 默认情况下，浏览器不会采用UTF-8，这就是显示错误字符的原因。

Answer 2

If that's the only character that needs replacing just use str_replace() 如果这是唯一需要替换的字符，请使用str_replace()

var_dump(str_replace('&nbsp;', ' ', "this is&nbsp;a text"));

See it in action 看到它在行动

在PHP中删除

问题描述

2 个解决方案

解决方案1
3 2013-03-07 17:32:51

解决方案2
1 2013-03-07 17:31:43

在PHP中删除

问题描述

2 个解决方案

解决方案1 3 2013-03-07 17:32:51

解决方案2 1 2013-03-07 17:31:43

解决方案1
3 2013-03-07 17:32:51

解决方案2
1 2013-03-07 17:31:43