简体   繁体   English

PHP上传的文件名:日语字符编码

[英]PHP Uploaded file name: Japanese character encoding

When uploading a file with a japanese name, some characters are creating problem. 当上传带有日语名称的文件时,某些字符造成了问题。 On a windows system, I want to save the name of the file as-uploaded. 在Windows系统上,我想将文件名保存为上载状态。 So I have to use mb_convert_encoding($name, "SJIS", "AUTO"); 所以我必须使用mb_convert_encoding($name, "SJIS", "AUTO"); which works fine most of the cases. 在大多数情况下都能正常工作。

Though, some characters like as in 0423図表① totally disappear at the end. 虽然,一些字符,如0423図表①完全消失在年底。 It seems that when uploaded the name of the file is already "wrong": it looks like "0423å³è¡¨â .pptx" in UTF-8 and if I change the header charset with 看来,上载时文件名已经“错误”:在UTF-8看起来像"0423å³è¡¨â .pptx" ,如果我用

header('Content-Type: text/html; charset=SJIS');

it looks like 看起来像

 "0423テ・ツ崢ウティツ。ツィテ「ツ堕.pptx"

I am not sure what I can do in this case. 我不确定在这种情况下我能做什么。 I tried to replace the character but I cannot even find it with strpos() before or after the encoding conversion. 我试图替换字符,但在编码转换之前或之后,甚至都无法用strpos()找到它。

To qualify my answer (to the downvoter): 要使我的答案(对下降投票者)符合条件:

Q: I have heard that UTF-8 does not support some Japanese characters. 问:我听说UTF-8不支持某些日语字符。 Is this correct? 这个对吗?

A: There is a lot of misinformation floating around about the support of Chinese, Japanese and Korean (CJK) characters. 答:关于中文,日文和韩文(CJK)字符的支持,周围有很多错误信息。 The Unicode Standard supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X 0221, or JIS X 0213, for example, and many more. 例如,Unicode标准支持JIS X 0208,JIS X 0212,JIS X 0221或JIS X 0213中的所有CJK字符,等等。 This is true no matter which encoding form of Unicode is used: UTF-8, UTF-16, or UTF-32. 无论使用哪种Unicode编码形式,都是如此:UTF-8,UTF-16或UTF-32。

Unicode supports over 80,000 CJK characters right now, and work is underway to encode further additions. Unicode现在支持超过80,000个CJK字符,并且正在进行对其他添加项进行编码的工作。 The International Standard ISO/IEC 10646 and the Unicode Standard are completely synchronized in repertoire and content. 国际标准ISO / IEC 10646和Unicode标准在曲目和内容上完全同步。 And that means that Unicode has the same repertoire as GB 18030, since that also is synchronized with ISO 10646 — although with a different ordering and byte format. 这意味着Unicode具有与GB 18030相同的功能,因为它也与ISO 10646同步-尽管具有不同的顺序和字节格式。

From: The Unicode Consortium . 来自: Unicode联盟

My Answer: 我的答案:

Rather than strpos use mb_stripos , from the PHP Multibyte string functions to find and replace characters. 不用strpos而是使用PHP Multibyte字符串函数中的 mb_stripos来查找和替换字符。 This should help your script detect and translate the non-latin characters. 这应该可以帮助您的脚本检测和翻译非拉丁字符。

If the uploaded file name ( $_FILES['var']['name'] ) is already incorrect in the PHP script ( from output such as print_r($_FILES) ) then you need to ensure you are correctly encoding the HTML form with accept-charset='UTF-8' (or SJIS, etc.). 如果上载的文件名( $_FILES['var']['name'] )在PHP脚本中已经不正确( 来自诸如print_r($_FILES) ),则需要确保正确地使用HTML格式编码accept-charset='UTF-8' (或SJIS等)。 I would hope you're already well ahead of me on this. 我希望您已经在这方面领先于我。

Also it may be advisable to add a few preconditionals at the top of your code, again using the PHP mb_ functions add at the top of your PHP page: 同样建议在代码顶部添加一些前提条件,再次使用在PHP页面顶部添加的PHP mb_函数:

mb_internal_encoding('UTF-8'); //or whatever character set works for you
mb_http_output('SJIS');
mb_http_input('UTF-8');
mb_regex_encoding('UTF-8'); 

Out of interest: 不感兴趣:

http://www.unicode.org/reports/tr37/ http://www.unicode.org/reports/tr37/

and

http://david.latapie.name/blog/shift-jis-utf-8/ http://david.latapie.name/blog/shift-jis-utf-8/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM