简体繁体 English

下载名称中包含非ASCII字符的文件

[英]Download files with non-ASCII characters in the name

原文 2013-02-07 21:36:11 4 3 php/ html/ apache/ filenames

My website allows users to upload files with any name. 我的网站允许用户上传任何名称的文件。 Some names, of course, will have non-ASCII characters. 当然，某些名称将具有非ASCII字符。 When a user uploads a file, I save it in a folder with its original name. 当用户上传文件时，我将其保存在原始名称的文件夹中。 However, when I try to download it, by accessing its location (for example, files/Tolstoy - How much land does a man need?.pdf ), I get a 404. Is there some way to solve this, so that the files remain with their original name? 但是，当我尝试下载它时，通过访问它的位置（例如， files/Tolstoy - How much land does a man need?.pdf ），我得到了404。有什么方法可以解决此问题，以便文件保持原名？ Via Apache, maybe? 通过Apache，也许？

3 个解决方案

Um, just use url encoding, known also as percent encoding ? 嗯，仅使用url编码，也称为百分比编码？ that's meant to handle the urls in web. 这是为了处理网络中的网址。 All urls printed to HTML should be url encoded. 所有打印为HTML的网址均应进行网址编码。

For PHP, rawurlencode should be used, as it should be standards-compliant, which urlencode isn't. 对于PHP，应该使用rawurlencode ，因为它应该符合标准，而urlencode不是。

Edit: for this issue 编辑：此问题

PHP encodes "é" as "e%26%23769%3B", instead of "e%CC%81" PHP将“é”编码为“ e％26％23769％3B”，而不是“ e％CC％81”

e%CC%81 would be UTF-8 for é . e%CC%81将是é UTF-8。 e%26%23769%3B would be for é e%26%23769%3B用于é , which is an HTML entity for the same. ，这是相同的HTML实体。 This means that you're doing either explicit htmlentities() call there before urlencoding, or your server setup does that automatically. 这意味着您可以在进行urlencoding之前在其中进行显式的htmlentities（）调用，或者服务器设置会自动执行该操作。 It's not strictly needed if proper character sets are in place (only htmlspecialchars call is actually needed), but it shouldn't break anything either. 如果适当的字符集到位，则不是严格需要的（实际上只需要htmlspecialchars调用），但是它也不应该破坏任何内容。

Some online tools if you want to test these out: 一些在线工具，如果您想测试一下：

http://htmlentities.net/ to test converting html entities back and forth http://htmlentities.net/测试来回转换html实体
http://www.hypergurl.com/urlencode.html to test url encoding back and forth, with both UTF-8 and ASCII http://www.hypergurl.com/urlencode.html来测试来回的URL编码，同时使用UTF-8和ASCII

Workaround: convert filenames to ASCII at upload. 解决方法：上传时将文件名转换为ASCII。 You will be happy with it. 您将对此感到满意。

Well, for some reason that I still don't understand, using rawurlencode() instead of urlencode() made it work. 好吧，由于某些我仍然不了解的原因，使用rawurlencode()而不是urlencode()使其可行。

However, the character é (among others, I'm sure) is still being encoded strangely ( e%26%23769%3B instead of simply %C3%A9 ). 但是，字符é （我敢肯定，其中的其他字符）仍被奇怪地编码（ e%26%23769%3B而不是简单的%C3%A9 ）。 Even stranger is that the links containing it work. 甚至更奇怪的是包含它的链接起作用。