简体   繁体   English

如何为 HTTP 标头编码 UTF8 文件名? (蟒蛇,姜戈)

[英]How to encode UTF8 filename for HTTP headers? (Python, Django)

I have problem with HTTP headers, they're encoded in ASCII and I want to provided a view for downloading files that names can be non ASCII.我对 HTTP 标头有疑问,它们是用 ASCII 编码的,我想提供一个视图来下载名称可以是非 ASCII 的文件。

response['Content-Disposition'] = 'attachment; filename="%s"' % (vo.filename.encode("ASCII","replace"), )

I don't want to use static files serving for same issue with non ASCII file names but in this case there would be a problem with File system and it's file name encoding.我不想使用 static 文件来处理非 ASCII 文件名的相同问题,但在这种情况下,文件系统及其文件名编码会出现问题。 (I don't know target os.) (我不知道目标操作系统。)

I've already tried urllib.quote(), but it raises KeyError exception.我已经尝试过 urllib.quote(),但它引发了 KeyError 异常。

Possibly I'm doing something wrong but maybe it's impossible.可能我做错了什么,但也许这是不可能的。

This is a FAQ.这是一个常见问题。

There is no interoperable way to do this.没有可互操作的方法来做到这一点。 Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).一些浏览器实现专有扩展(IE、Chrome),其他实现 RFC 2231(Firefox、Opera)。

See test cases at http://greenbytes.de/tech/tc2231/ .请参阅http://greenbytes.de/tech/tc2231/ 上的测试用例。

Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).更新:截至 2012 年 11 月,所有当前的桌面浏览器都支持 RFC 6266 和 RFC 5987 中定义的编码(Safari >= 6、IE >= 9、Chrome、Firefox、Opera、Konqueror)。

Don't send a filename in Content-Disposition.不要在 Content-Disposition 中发送文件名。 There is no way to make non-ASCII header parameters work cross-browser(*).没有办法让非 ASCII 标头参数跨浏览器(*)工作。

Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default.相反,只发送“Content-Disposition:attachment”,并将文件名作为 URL 编码的 UTF-8 字符串保留在 URL 的尾随 (PATH_INFO) 部分,供浏览器默认选择和使用。 UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.与 Content-Disposition 相比,浏览器处理 UTF-8 URL 的可靠性要高得多。

(*: actually, there's not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.) (*:实际上,目前甚至没有一个标准来说明该如何做,因为 RFC 2616、2231 和 2047 之间的关系非常不正常,这是 Julian 试图在规范级别澄清的。一致的浏览器支持是在遥远的未来。)

Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.请注意,在 2011 年, RFC 6266 (尤其是附录 D)对此问题进行了权衡,并提出了具体的建议。

Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.也就是说,你可以发出一个filename ,只有ASCII字符,然后filename*与那些理解代理RFC 5987格式的文件名。

Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf通常这看起来像filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf , where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces). filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf ,其中 Unicode 文件名(“My Résumé.pdf”)被编码为 UTF-8 然后百分比编码(注意,不要使用+表示空格)。

Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.请实际阅读 RFC 6266 和 RFC 5987(或使用一个强大且经过测试的库来为您抽象),因为我在这里的总结缺乏重要的细节。

Starting with Django 2.1 (see issue #16470 ), you can use FileResponse , which will correctly set the Content-Disposition header for attachments.Django 2.1 (请参阅问题#16470 )开始,您可以使用FileResponse ,它将正确设置附件的Content-Disposition标头。 Starting with Django 3.0 (issue #30196 ) it will also set it correctly for inline files.Django 3.0 (issue #30196 ) 开始,它还将为inline文件正确设置它。

For example, to return a file named my_img.jpg with MIME type image/jpeg as an HTTP response:例如,要返回一个名为my_img.jpg且 MIME 类型为image/jpeg作为 HTTP 响应:

response = FileResponse(open("my_img.jpg", 'rb'), as_attachment=True, content_type="image/jpeg")
return response

Or, if you can't use FileResponse , you can use the relevant part from FileResponse 's source to set the Content-Disposition header yourself.或者,如果您不能使用FileResponse ,您可以使用FileResponse源中的相关部分来自己设置Content-Disposition标头。 Here's what that source currently looks like:这是该来源目前的样子:

from urllib.parse import quote

disposition = 'attachment' if as_attachment else 'inline'
try:
    filename.encode('ascii')
    file_expr = 'filename="{}"'.format(filename)
except UnicodeEncodeError:
    file_expr = "filename*=utf-8''{}".format(quote(filename))
response.headers['Content-Disposition'] = '{}; {}'.format(disposition, file_expr)

I can say that I've had success using the newer ( RFC 5987 ) format of specifying a header encoded with the e-mail form ( RFC 2231 ).我可以说我使用较新的( RFC 5987 )格式成功地指定了用电子邮件表单( RFC 2231 )编码的标头。 I came up with the following solution which is based on code from the django-sendfile project.我想出了以下基于 django-sendfile 项目代码的解决方案。

import unicodedata
from django.utils.http import urlquote

def rfc5987_content_disposition(file_name):
    ascii_name = unicodedata.normalize('NFKD', file_name).encode('ascii','ignore').decode()
    header = 'attachment; filename="{}"'.format(ascii_name)
    if ascii_name != file_name:
        quoted_name = urlquote(file_name)
        header += '; filename*=UTF-8\'\'{}'.format(quoted_name)

    return header

# e.g.
  # request['Content-Disposition'] = rfc5987_content_disposition(file_name)

I have only tested my code on Python 3.4 with Django 1.8 .我只用Django 1.8Python 3.4上测试了我的代码。 So the similar solution in django-sendfile may suite you better.因此django-sendfile 中的类似解决方案可能更适合您。

There's a long standing ticket in Django's tracker which acknowledges this but no patches have yet been proposed afaict. Django 的跟踪器中有一张长期存在的票证,它承认这一点,但尚未提出任何补丁。 So unfortunately this is as close to using a robust tested library as I could find, please let me know if there's a better solution.所以不幸的是,这与我能找到的使用强大的测试库一样接近,如果有更好的解决方案,请告诉我。

The escape_uri_path function from Django is the solution that worked for me.来自 Django 的escape_uri_path function 是对我有用的解决方案。

Read the Django Docs here to see which RFC standards are currently specified.阅读此处的 Django 文档,了解当前指定了哪些 RFC 标准。

from django.utils.encoding import escape_uri_path

file = "response.zip"
response = HttpResponse(content_type='application/zip')
response['Content-Disposition'] = f"attachment; filename*=utf-8''{escape_uri_path(file)}"

A hack:一个黑客:

if (Request.UserAgent.Contains("IE"))
{
  // IE will accept URL encoding, but spaces don't need to be, and since they're so common..
  filename = filename.Replace("%", "%25").Replace(";", "%3B").Replace("#", "%23").Replace("&", "%26");
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM