简体   繁体   English

使用 base64 解码大文件时出现内存错误

[英]MemoryError while decoding with base64 for large files

I am trying to decode string which is stored in a file.我正在尝试解码存储在文件中的字符串。 The file size is around 300 MB.文件大小约为 300 MB。 It's throwing MemoryError while decoding.它在解码时抛出 MemoryError 。

base64.b64decode(bytes(file_content))

Is there any solution for this.是否有任何解决方案。

If your input data isn't already bytes , you've just forced an unnecessary copy to make a true bytes object.如果您的输入数据还不是bytes ,您只是强制一个不必要的副本来创建一个真正的bytes对象。 Annoyingly, it looks like base64.b64decode will perform this unnecessary conversion for you for all inputs aside from bytearray , even though the underlying API ( binascii.a2b_base64 ) properly supports the buffer protocol (so it works just fine on, say, mmapped files).令人讨厌的是,即使底层 API( binascii.a2b_base64 )正确支持缓冲区协议(因此它可以很好地处理,例如, mmapped 文件),它看起来base64.b64decode将为您执行此不必要的转换,除了bytearray之外的所有输入.

So if you want to avoid an unnecessary copy, change to:因此,如果您想避免不必要的副本,请更改为:

binascii.a2b_base64(file_content)

which decodes without copying the input at all.它在不复制输入的情况下进行解码。

The other general tip is to make sure you're running a 64 bit build of Python (the default suggested installer for Windows is still 32 bit, sadly).另一个一般提示是确保您运行的是 64 位 Python 版本(遗憾的是,Windows 的默认建议安装程序仍然是 32 位)。 When you're talking hundreds of MB of data, with copies of it floating around, it's pretty easy to hit the 2 GB limit on user mode virtual address space, and upgrading to 64 bit Python will fix that for you (your code might be slow if you don't have enough RAM for it, but it shouldn't die from a MemoryError so easily).当你谈论数百 MB 的数据时,它的副本四处漂浮,很容易达到用户模式虚拟地址空间的 2 GB 限制,升级到 64 位 Python 将为你解决这个问题(你的代码可能是如果您没有足够的 RAM,它会很慢,但它不应该如此容易地死于MemoryError )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM