简体   繁体   English

PHP将nbsp转换为“”

[英]PHP converting nbsp to “ ”

I have been trying to get this to work for the last 3 hours but to no avail. 我一直在尝试让它在过去3个小时内正常工作,但无济于事。

<?php
    foreach ($array as $item) {
      $item = preg_replace("~ (?=[^<>]*(<|$))~", "&nbsp;", $item);
      logWrite($item);
      echo $item;
    }
?>

the $array is made up of a list of items eg "bread" , "cheese" , "red wine" - the regexp is there to make sure it only works on text between the open and close html tags (courtesy of someone else here). $array由项目列表组成,例如"bread""cheese""red wine" -regexp可以确保它仅适用于打开和关闭html标签之间的文本(由其他人提供) )。

Anyway the problem is that when I write to the log - it comes out as "bread" , "cheese" , "red&nbsp;wine" but the echo (I have tried print as well) on the html page is unchanged from "bread" , "cheese" , "red wine" . 无论如何,问题是,当我写入日志时,它显示为"bread""cheese""red&nbsp;wine"但是html页面上的回显(我也尝试过打印)与"bread""cheese""red wine"

If I use a different character to replace eg &reg; 如果我使用其他字符替换例如&reg; it works fine. 它工作正常。 Any ideas why this particular entity does not work? 有什么想法为什么这个特定的实体不起作用? I think my charsets are all fine. 我认为我的字符集很好。

Thanks! 谢谢!

You do not need to use regexp here. 您无需在此处使用regexp。 Try with: 尝试:

$item = str_replace('&nbsp;', ' ', $item);

If you want to check if &nbsp; 如果您要检查&nbsp; is between HTML tags, you should do it before ( if statement, etc) - it will be more clear. 在HTML标记之间,您应该在操作之前(例如if语句等)进行操作-这样会更加清楚。

However do not use regexp with html - it's evil. 但是不要将regexp与html一起使用-这是邪恶的。

&nbsp; is an HTML entity for "non-breaking space", so it's going to appear as a space (not the actual characters) in an HTML document, therefore you will not notice the difference between &nbsp; 是用于“不间断空格”的HTML实体,因此它将在HTML文档中显示为空格(而不是实际字符),因此您不会注意到&nbsp;之间的区别 and a normal space. 和一个正常的空间。 View the source code and you will see it. 查看源代码,您将看到它。

Assuming you're interested in decoding all HTML entities, you can use html_entity_decode : 假设您有兴趣解码所有HTML实体,则可以使用html_entity_decode

http://www.php.net/manual/en/function.html-entity-decode.php http://www.php.net/manual/en/function.html-entity-decode.php

It's much simpler than trying to use a regex. 这比尝试使用正则表达式要简单得多。

当您想在HTML页面上显示“原始” HTML内容时,应使用htmlspecialchars()

echo htmlspecialchars( $item );

As per http://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/ 根据http://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/

I had some character that the parser does not know how to interput because it was outside the byte range of the UTF8 format. 我有一个字符,解析器不知道如何解析,因为它超出了UTF8格式的字节范围。 Some of the PHP functions, like iconv, still let some non-UTF8 characters through which breaks the parser. 一些PHP函数,例如iconv,仍然允许一些非UTF8字符中断解析器。 The preg_replace just rips out any non-UTF8 character based on it's byte sequence and replaces it with a question mark. preg_replace会根据其字节序列仅剔除所有非UTF8字符,并将其替换为问号。

//reject overly long 2 byte sequences, as well as characters above U+10000 and replace with ?
$some_string = preg_replace('/[\x00-\x08\x10\x0B\x0C\x0E-\x19\x7F]'.
 '|[\x00-\x7F][\x80-\xBF]+'.
 '|([\xC0\xC1]|[\xF0-\xFF])[\x80-\xBF]*'.
 '|[\xC2-\xDF]((?![\x80-\xBF])|[\x80-\xBF]{2,})'.
 '|[\xE0-\xEF](([\x80-\xBF](?![\x80-\xBF]))|(?![\x80-\xBF]{2})|[\x80-\xBF]{3,})/S',
 '?', $some_string );

//reject overly long 3 byte sequences and UTF-16 surrogates and replace with ?
$some_string = preg_replace('/\xE0[\x80-\x9F][\x80-\xBF]'.
 '|\xED[\xA0-\xBF][\x80-\xBF]/S','?', $some_string );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM