简体   繁体   English

Windows服务器上的PHP7 UTF-8文件名,由ZipArchive引起的新现象

[英]PHP7 UTF-8 filenames on Windows server, new phenomenon caused by ZipArchive

Update: 更新:

Preparing a bug report to the great people that make PHP 7 possible I revised my research once more and tried to melt it down to a few simple lines of code. 准备一个错误报告给那些使PHP 7成为可能的伟大人物我再次修改了我的研究,并试图将其简化为几行简单的代码。 While doing this I found that PHP itself is not the cause of the problem. 在这样做时,我发现PHP本身不是问题的原因。 I will share my results here when I'm done. 我完成后,我会在这里分享我的结果。 Just so you know and don't possibly waste your time or something :) 只是你知道,不要浪费你的时间或东西:)


Synopsis: PHP7 now seems able to write UTF-8 filenames but is unable to access them? 概要:PHP7现在似乎能够编写UTF-8文件名但无法访问它们?

Preamble: I read about 10-15 articles here touching the subject but they did not help me solve the problem and they all are older than the PHP7 release. 序言:我在这里阅读了大约10-15篇文章,但是它们并没有帮助我解决问题,而且它们都比PHP7版本更早。 It seems to me that this is probably a new issue and I wonder if it might be a bug. 在我看来,这可能是一个新问题,我想知道它是否可能是一个错误。 I spent a lot of time experimenting with en-/decoding of the strings and trying to figure out a way to make it work - to no avail. 我花了很多时间尝试对字符串进行解码和解码,并试图找到一种方法使其工作 - 无济于事。

Good day everybody and greetings from Germany (insert shy not-my-native-language-remark here), I hope you can help me out with this new phenomenon I encountered. 大家好日子和来自德国的问候(在这里插入害羞的非母语 - 评论),我希望你能帮助我解决我遇到的这种新现象。 It seems to be "new" in the sense that it came with PHP 7. 从PHP 7开始,它似乎是“新的”。

I think most people working with PHP on a Windows system are very familiar with the problem of filenames and the transparent wrapper of PHP that manages access to files that have non-ASCII filenames (or windows-1252 or whatever is the system code page). 我认为大多数在Windows系统上使用PHP的人都非常熟悉文件名的问题以及PHP的透明包装,它管理对具有非ASCII文件名(或windows-1252或任何系统代码页)的文件的访问。

I'm not quite sure how to approach the subject and as you can see I'm not very experienced in composing questions so please don't rip my head off instantly. 我不太确定如何处理这个问题,因为你可以看到我在编写问题方面不是很有经验,所以请不要立刻扯掉我的头。 And yes I will strive to keep it short. 是的,我会努力保持简短。 Here we go: 开始了:

First symptom: after updating to PHP7 I sometimes encountered problems with accessing files generated by my software. 第一个症状:更新到PHP7后,我有时会遇到访问我的软件生成的文件的问题。 Sometimes it worked as usual, sometimes not. 有时它照常工作,有时不工作。 I found out the difference was that PHP7 now seems able to write UTF-8 filenames but is unable to access files with those names. 我发现不同之处在于PHP7现在似乎能够编写UTF-8文件名但无法访问具有这些名称的文件。

After generating said files on two separate "identical" systems (differing only in the PHP version) this is how the files are named on the hard drive: 在两个独立的“相同”系统上生成所述文件后(仅在PHP版本中有所不同),这就是在硬盘驱动器上命名文件的方式:

PHP 5.5: Lokaltest_KG_æ¼¢å—_汉å—_Krümhold-DEZ1604-140081-complete.zip PHP 5.5:Lokaltest_KG_æ¼¢å-_汉-_Krümhold-DEZ1604-140081-complete.zip

PHP 7: Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip PHP 7:Lokaltest_KG_汉字_汉字_Krümhold-DEZ1604-140081-complete.zip

Splendid, PHP 7 is capable of writing unicode-filenames on the HDD, and UTF-16 is used on windows afaik. Splendid,PHP 7能够在HDD上编写unicode文件名,而UTF-16则在Windows上使用。 Now the downside is that when I try to access those files for example with is_file() PHP 5.5 works but PHP 7 does not. 现在的缺点是,当我尝试访问这些文件时,例如使用is_file() PHP 5.5可以正常工作,但PHP 7却没有。

Consider this code snippet (note: I "hacked" into this function because it was the simplest way, it was not written for this purpose). 考虑一下这段代码片段(注意:我“入侵”了这个函数,因为它是最简单的方法,它不是为此而编写的)。 This function gets called after a zip-file gets generated taking on the name of the customer and other values to determine a proper name. 生成zip文件后调用此函数,获取客户名称和其他值以确定正确的名称。 Those come out of the database. 那些来自数据库。 Database and internal encoding of PHP are both UTF-8. PHP的数据库和内部编码都是UTF-8。 clearstatcache is per se not necessary but I included it to make things clearer. clearstatcache本身并不是必需的,但我把它包括在内以使事情更加清晰。 Important : Everything that happens is done with PHP7, no other entity is responsible for creating the zip-file. 重要提示 :所有发生的事情都是使用PHP7完成的,没有其他实体负责创建zip文件。 To be precise it is done with class ZipArchive . 确切地说,它是使用class ZipArchive完成的。 Actually it does not even matter that it is a zip-archive, the point is that the filename and the content of the file are created by PHP7 - successfully. 实际上它甚至不是一个zip-archive,重点是文件的文件名和内容是由PHP7创建的 - 成功。

public static function downloadFileAsStream( $file )
{
    clearstatcache();
    print $file . "<br/>";
    var_dump(is_file($file));
    die();
}       

Output is: 输出是:

D:/htdocs/otm/.data/_tmp/Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip
bool(false) 

So PHP7 is able to generate the file - they indeed DO exist on the harddrive and are legit and accessible and all - but is incapable of accessing them. 所以PHP7能够生成文件 - 它们确实存在于硬盘驱动器上并且是合法且可访问的 - 但是无法访问它们。 is_file is not the only function that fails, file_exists() does too for example. is_file不是唯一失败的函数,例如file_exists()也是如此。

A little experiment with encoding conversion to give you a taste of the things I tried: 一个关于编码转换的小实验,让您体验我尝试过的东西:

public static function downloadFileAsStream( $file )
{
    clearstatcache();
    print $file . "<br/>";
    print mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', false) . "<br/>";
    print mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', true) . "<br/>";

    if (($detectedEncoding = mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', true)) != 'windows-1252')
    {
        $file = mb_convert_encoding($file, 'UTF-16', $detectedEncoding);
    }

    print $file . "<br/>";
    var_dump(is_file($file));
    die();
}       

Output is: 输出是:

D:/htdocs/otm/.data/_tmp/Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip
UTF-8
UTF-8
D:/htdocs/otm/.data/_tmp/Lokaltest_KG_o"[W_lI[W_Kr�mhold-DEZ1604-140081-complete.zip
NULL 

So converting from UTF-8 (database/internal encoding) to UTF-16 (windows file system) does not seem to work either. 因此,从UTF-8(数据库/内部编码)转换为UTF-16(Windows文件系统)似乎也不起作用。

I am at the end of my rope here and sadly the issue is very important to us since we cannot update our systems with this problem looming in the background. 我在这里结束了,遗憾的是这个问题对我们来说非常重要,因为我们无法在后台隐藏这个问题来更新我们的系统。 I hope somebody can shed a little light on this. 我希望有人可以对此有所了解。 Sorry for the long post, I'm not sure how well I could get my point across. 对不起,很长的帖子,我不知道我能说得多好。


Addition: 加成:

$file = utf8_decode($file);
var_dump(is_file($file));
die();

Delivers false for the filename with the japanese letters. 使用日文字母为文件名提供false。 When I change the input used to create the filename so that the filename now is Lokaltest_KG_Krümhold-DEZ1604-140081-complete.zip above code delivers true. 当我更改用于创建文件名的输入时,现在文件名为Lokaltest_KG_Krümhold-DEZ1604-140081-complete.zip,上面的代码为true。 So utf8_decode helps but only with a small part of unicode, german umlauts? 所以utf8_decode有助于但只有一小部分unicode,德国变形金刚?

Answering my own question here: The actual bad boy was the component ZipArchive which created files with incorrectly encoded filenames. 在这里回答我自己的问题:实际的坏男孩是ZipArchive组件,它创建了文件名不正确的文件。 I have written a hopefully helpful bug report: https://bugs.php.net/bug.php?id=72200 我写了一篇有希望的有用的bug报告: https//bugs.php.net/bug.php?id = 72200

Consider this short script: 考虑这个简短的脚本:

print "php default_charset: ".ini_get('default_charset')."\n"; // just 4 info (UTF-8)

$filename = "bugtest_müller-lüdenscheid.zip"; // just an example
$filename = utf8_encode($filename); // simulating my database delivering utf8-string

$zip = new ZipArchive();
if( $zip->open($filename, ZipArchive::CREATE | ZipArchive::OVERWRITE) === true )
{
    $zip->addFile('bugtest.php', 'bugtest.php'); // copy of script file itself
    $zip->close();
}

var_dump( is_file($filename) );  // delivers ?

output: 输出:

output PHP 5.5.35:
    php default_charset: UTF-8
    bool(true)

output PHP 7.0.6:
    php default_charset: UTF-8
    bool(false)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM