简体   繁体   English

下载名称中包含非ASCII字符的文件

[英]Download files with non-ASCII characters in the name

My website allows users to upload files with any name. 我的网站允许用户上传任何名称的文件。 Some names, of course, will have non-ASCII characters. 当然,某些名称将具有非ASCII字符。 When a user uploads a file, I save it in a folder with its original name. 当用户上传文件时,我将其保存在原始名称的文件夹中。 However, when I try to download it, by accessing its location (for example, files/Tolstoy - How much land does a man need?.pdf ), I get a 404. Is there some way to solve this, so that the files remain with their original name? 但是,当我尝试下载它时,通过访问它的位置(例如, files/Tolstoy - How much land does a man need?.pdf ),我得到了404。有什么方法可以解决此问题,以便文件保持原名? Via Apache, maybe? 通过Apache,也许?

Um, just use url encoding, known also as percent encoding ? 嗯,仅使用url编码,也称为百分比编码 that's meant to handle the urls in web. 这是为了处理网络中的网址。 All urls printed to HTML should be url encoded. 所有打印为HTML的网址均应进行网址编码。

For PHP, rawurlencode should be used, as it should be standards-compliant, which urlencode isn't. 对于PHP,应该使用rawurlencode ,因为它应该符合标准,而urlencode不是。

Edit: for this issue 编辑:此问题

PHP encodes "é" as "e%26%23769%3B", instead of "e%CC%81" PHP将“é”编码为“ e%26%23769%3B”,而不是“ e%CC%81”

e%CC%81 would be UTF-8 for . e%CC%81将是 UTF-8。 e%26%23769%3B would be for é e%26%23769%3B用于é , which is an HTML entity for the same. ,这是相同的HTML实体。 This means that you're doing either explicit htmlentities() call there before urlencoding, or your server setup does that automatically. 这意味着您可以在进行urlencoding之前在其中进行显式的htmlentities()调用,或者服务器设置会自动执行该操作。 It's not strictly needed if proper character sets are in place (only htmlspecialchars call is actually needed), but it shouldn't break anything either. 如果适当的字符集到位,则不是严格需要的(实际上只需要htmlspecialchars调用),但是它也不应该破坏任何内容。

Some online tools if you want to test these out: 一些在线工具,如果您想测试一下:

Workaround: convert filenames to ASCII at upload. 解决方法:上传时将文件名转换为ASCII。 You will be happy with it. 您将对此感到满意。

Well, for some reason that I still don't understand, using rawurlencode() instead of urlencode() made it work. 好吧,由于某些我仍然不了解的原因,使用rawurlencode()而不是urlencode()使其可行。

However, the character é (among others, I'm sure) is still being encoded strangely ( e%26%23769%3B instead of simply %C3%A9 ). 但是,字符é (我敢肯定,其中的其他字符)仍被奇怪地编码( e%26%23769%3B而不是简单的%C3%A9 )。 Even stranger is that the links containing it work. 甚至更奇怪的是包含它的链接起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM