简体   繁体   English

PHP将html转换为空格,>转换为>等

[英]PHP convert html   to space, > to > etc

I want to convert all html tags(&nbsp &gt &lt etc) to text format; 我想将所有html标签(&nbsp&gt&lt等)转换为文本格式; I have try 我试过了

html_entity_decode() 

but it will return ? 但它会回来吗? if &nbsp. 如果&nbsp。

Use htmlspecialchars_decode is the opposite of htmlspecialchars . 使用htmlspecialchars_decodehtmlspecialchars相反。
Example from the PHP documentation page: PHP文档页面中的示例:

    $str = '<p>this -&gt; &quot;</p>';
    echo htmlspecialchars_decode($str); 
    //Output: <p>this -> "</p>

html_entity_decode() is the opposite of htmlentities() in that it converts all HTML entities in the string to their applicable characters. html_entity_decode()htmlentities()相反,它将字符串中的所有HTML实体转换为适用的字符。

$orig = "I'll \"walk\" the <b>dog</b> now";

$a = htmlentities($orig);

$b = html_entity_decode($a);

echo $a; // I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt; now

echo $b; // I'll "walk" the <b>dog</b> now

Use 使用

html_entity_decode()
instead of 代替
 html_entity_encode() html_entity_encode() 

If you check the html_entity_decode() manual: 如果您查看html_entity_decode()手册:

You might wonder why trim(html_entity_decode(' ')); 你可能想知道为什么修剪(html_entity_decode('')); doesn't reduce the string to an empty string, that's because the ' ' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset. 不会将字符串缩减为空字符串,这是因为''实体不是ASCII代码32(由trim()剥离),而是默认ISO 8859-1字符集中的ASCII代码160(0xa0)。

You can nest your html_entity_decode() function inside a str_replace() to ASCII #160 to a space: 您可以嵌套你html_entity_decode()一个内部函数str_replace()函数为ASCII#160的空间:

<?php

echo str_replace("\xA0", ' ', html_entity_decode('ABC &nbsp; XYZ') );

?>

I know my answer is coming in really late but thought it might help someone else. 我知道我的答案很晚才到,但我认为这可能有助于其他人。 I find that the best way to extract all special characters is to use utf8_decode() in php. 我发现提取所有特殊字符的最佳方法是在php中使用utf8_decode() Even for dealing with &nbsp; 即使是处理&nbsp; or any other special character representing blank space use utf8_decode() . 或代表空格的任何其他特殊字符使用utf8_decode()

After using utf8_decode() one can manipulate these characters directly in the code. 使用utf8_decode()之后,可以直接在代码中操作这些字符。 For example, in the following code, the function clean() replaces &nbsp; 例如,在以下代码中,函数clean()替换了&nbsp; with a blank. 一片空白。 Then it replaces all extra white spaces with a single white space using preg_replace() . 然后使用preg_replace()用一个空格替换所有额外的空格。 Leading and trailing white spaces are removed using trim() . 使用trim()删除前导和尾随空格。

function clean($str)
{       
    $str = utf8_decode($str);
    $str = str_replace("&nbsp;", "", $str);
    $str = preg_replace("/\s+/", " ", $str);
    $str = trim($str);
    return $str;
}

$html = "&nbsp;&nbsp;&nbsp;&nbsp;  &nbsp;Hello world! lorem ipsum.";
$output = clean($html);
echo $output;

Hello world! 你好,世界! lorem ipsum. lorem ipsum。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM