简体   繁体   English

使用S3BotoStorage上传到S3的文件最终会导致无效转义的内容类型元数据

[英]Files uploaded to S3 with S3BotoStorage end up with invalidly escaped content-type meta data

FACEPALM UPDATE : Turns out I had forgotten/overlooked the fact that I was using an older fork of S3BotoStorage from https://github.com/gtaylor/django-athumb as my default storage (even though I had django-storages installed). FACEPALM UPDATE :事实证明我忘记/忽略了这样一个事实,即我使用来自https://github.com/gtaylor/django-athumb的旧版S3BotoStorage作为我的默认存储(即使我安装了django-storage)。 The current version of django-storages doesn't suffer from this problem. 当前版本的django-storages没有遇到这个问题。 The problem was that the content-type headers were unicode when they hit boto, and boto escapes unicode using urllib.quoteplus before sending it on to AWS. 问题是内容类型头文件在击中boto时是unicode,而boto在将它发送到AWS之前使用urllib.quoteplus转义unicode。 This isn't really Boto's fault since headers have to be converted to non-unicode strings somehow per HTTP. 这不是Boto的错,因为每个HTTP都必须以某种方式将头转换为非unicode字符串。 For a more indepth analysis see https://github.com/boto/boto/issues/1669 . 有关更深入的分析,请参阅https://github.com/boto/boto/issues/1669

Original Question 原始问题

I am using django_storage's S3BotoStorage in conjunction with a FileField to upload files to Amazon S3. 我正在使用django_storage的S3BotoStorage和FileField将文件上传到Amazon S3。 Here's my field: 这是我的领域:

downloadable_file = FileField(max_length=255, upload_to="widgets/filedownloads", verbose_name="file") 

In settings: 在设置中:

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage' 

Everything works as far as the uploading/downloading goes. 在上传/下载过程中一切正常。

However , the files are getting stored in my bucket with an incorrect content-type. 但是 ,文件存储在我的存储桶中,内容类型不正确。 WhenI look at the metadata for the files in my AWS S3 console, the Content-Type of the file is showing up as "application%2Fpdf" instead of "application/pdf" which it should be. 当我查看AWS S3控制台中文件的元数据时,文件的Content-Type显示为“application%2Fpdf”而不是“application / pdf”。

转义内容类型

In case you say it shouldn't matter, it does matter. 如果你说它无关紧要,那就重要了。 Google Chrome's built-in pdf reader will hang on pdf's with an invalid content-type, and a client brought this to my attention. 谷歌Chrome的内置pdf阅读器将挂在pdf上,内容类型无效,客户端会引起我的注意。

Here's an example of a file uploaded through django-storages/boto. 这是通过django-storages / boto上传的文件的示例。 If you're using chrome's built-in pdf reader I assume it hangs, like it does for me and the customer who reported this. 如果您正在使用chrome的内置pdf阅读器,我认为它会挂起,就像我和报告此内容的客户一样。 If you're using a non-chrome browser, or the adobe plugin, or downloading the file to disk you'll probably be fine. 如果您使用的是非Chrome浏览器或adobe插件,或者将文件下载到磁盘,您可能会没问题。

If I manually change the content-type metadata via the AWS console to 'application/pdf' (one of the standard choices it provides) then its fine. 如果我通过AWS控制台手动将内容类型元数据更改为'application / pdf'(它提供的标准选项之一),那么就可以了。

I assume this is a bug with something internal with the way boto constructs the AWS policy document to upload the file, since I'm not doing anything outside of the standard usage here. 我认为这是一个内部错误,boto构建AWS策略文档以上传文件的方式,因为我没有做任何超出标准用法的事情。 However, I've stepped through boto code and can't find where it actually does the escaping. 但是,我已经介绍了boto代码,无法找到它实际逃逸的位置。

Can someone either suggest a work around, or guide me to the offending code in boto so I can patch it and submit a pull request? 有人可以建议一个解决方法,或者引导我查看boto中的违规代码,以便我可以修补它并提交拉取请求吗?

boto==2.9.5 django-storages==1.1.8 boto == 2.9.5 django-storages == 1.1.8

Not a direct answer to your question, but maybe a useful workaround. 不是您的问题的直接答案,但可能是一个有用的解决方法。 I was having issues using django-storages with S3. 我在使用S3的django-storage时遇到了问题。 I ended up trying cuddly-buddly and have been quite happy with it. 我最后还是尝试了可爱的兄弟 ,并对此感到非常满意。 The author based it on the S3 module from django-storages and has added quite a number of fixes. 作者基于django-storages的S3模块,并添加了许多修复程序。 I browsed through the cuddly-buddly commits and there were some modifications affecting the content-type header, but I can't test with PDF uploads without setting up a new django project. 我浏览了可爱的提交,并且有一些修改影响了内容类型标题,但是我无法在没有设置新的django项目的情况下测试PDF上传。 However, I can verify that all my files uploaded through Django do not have mangled slashes in the content-type field in the S3 Metadata. 但是,我可以验证通过Django上传的所有文件在S3元数据的content-type字段中没有损坏的斜杠。

If for some reason you can't change over to cuddly-buddly for testing, let me know and I'll try to setup a simple Django project to upload some PDFs. 如果由于某种原因你无法转换为可爱的兄弟进行测试,请告诉我,我将尝试设置一个简单的Django项目来上传一些PDF。

The problem was that I was using a forked/obsolete version of django storages which did not properly convert content-type headers to strings from unicode before sending them to boto, which converts unicode strings to ascii strings (as required for HTTP headers) by using urllib's quoteplus escape mechanism. 问题是我使用的是django存储的分叉/过时版本,它在将内容类型头文件发送到boto之前没有正确地将内容类型头文件转换为字符串,后者通过使用将unicode字符串转换为ascii字符串(根据HTTP头文件的要求) urllib的quoteplus转义机制。 The problem was fixed by switching to the current version of django-storages. 通过切换到当前版本的django-storage来解决该问题。

For a more detailed analysis of the issue see: https://github.com/boto/boto/issues/1669#issuecomment-27132112 有关该问题的更详细分析,请参阅: https//github.com/boto/boto/issues/1669#issuecomment-27132112

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM