简体   繁体   English

将文件内容输出为UTF-8会导致字符编码问题

[英]Outputting file contents as UTF-8 leads to character encoding issues

I set my header as follows: 我将标题设置如下:

header( 'Content-Type: text/html; charset="utf-8"' );

and then output a local file on my server to the browser using the following code-segment: 然后使用以下代码段将服务器上的本地文件输出到浏览器:

$content = file_get_contents($sPath);
$content = mb_convert_encoding($content, 'UTF-8');
echo $content;

The files I have on the server are created by lua and thus, the output of the following is FALSE (before conversion): 我在服务器上拥有的文件是由lua创建的,因此,以下输出为FALSE (转换前):

var_dump( mb_detect_encoding($content) );

The files contain some characters like ( ™ ) etc. and these appear as plain square boxes in browsers. 这些文件包含一些字符,例如™ )等,它们在浏览器中显示为普通方形框。 I've read the following threads which were suggested as similar questions and none of the variations in my code helped: 我已阅读以下被建议为类似问题的线程,并且我的代码中的所有变体都无济于事:

There seem to be no problems when I simply use the following: 当我简单地使用以下内容时,似乎没有问题:

header( 'Content-Type: text/html; charset="iso-8859-1"' );
// setting path here
$content = file_get_contents($sPath);
echo $content;

There seem to be no problems when I simply use the following: 当我简单地使用以下内容时,似乎没有问题:

 header( 'Content-Type: text/html; charset="iso-8859-1"' ); // setting path here $content = file_get_contents($sPath); echo $content; 

So this means the file content is actually encoded in ISO-8859-1. 因此,这意味着文件内容实际上是按照ISO-8859-1编码的。 If you want to output this as UTF-8, then explicitly convert from ISO-8859-1 to UTF-8: 如果要将其输出为UTF-8,则将其从ISO-8859-1明确转换为UTF-8:

$content = mb_convert_encoding($content, 'UTF-8', 'ISO-8859-1');

You always need to know what you're converting from . 您始终需要知道要从中进行转换。 Just telling PHP to "convert to UTF-8" and leaving it guessing what to convert from has an undefined outcome, and in your case it does not work. 只是告诉PHP“转换为UTF-8”,然后让其猜测要转换的内容会有不确定的结果,在您的情况下,它是行不通的。

Check the file encoding, is it utf-8 without BOM ? 检查文件编码,是否为没有BOM的utf-8 For example, use the notepad++ for check file encoding. 例如,使用notepad ++进行检查文件编码。

Or mayby it's usefull: 或者也许它很有用:

$content = file_get_contents($sPath);
$content = htmlentities($content);
echo $content;

Or try in .htaccess: 或者尝试.htaccess:

AddDefaultCharset utf-8
AddCharset utf-8 *
<IfModule mod_charset.c>
    CharsetSourceEnc utf-8
    CharsetDefault utf-8
</IfModule>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM