简体   繁体   English

由于列表/数组太大,出现MemoryError错误

[英]Getting a MemoryError because list/array is too large

Problem 问题

I have to download object_x . 我必须下载object_x For simplicity's sake, object_x comprises a series of integers adding up to 1000 . 为简单起见, object_x包含一系列integersobject_x1000 The download is irregular. 下载是不定期的。 I receive groups or chunks of integers in seemingly random order, and I need to keep track of them until I have all 1000 to make up the final object_x . 我收到看似随机的整数组或整数chunks ,并且需要跟踪它们,直到我拥有全部1000组成最终的object_x

The incoming chunks can also overlap, so for instance: 传入的块也可以重叠,因此例如:

Chunk 1: integers 0-500
Chunk 2: integers 600-1000
Chunk 3: integers 400-700

Current method 当前方法

Create object_x as a list containing all of its comprising integers 0-1000 . 创建object_x作为包含其所有整数0-1000list When a chunk is downloaded, remove all of the integers that comprise the chunk from object_x . 当一个chunk被下载,删除所有包含整数的chunkobject_x Keep doing this until object_x is empty (known to be complete then). 继续执行此操作,直到object_x为空(然后完成)为止。

object_x = range(0,1000)

# download chunk 1
chunk = range(0, 500)

for number in chunk:
    if number in object_x:
        object_x.remove(number)

# repeat for every downloaded chunk

Conclusion 结论

This method is very memory intensive. 此方法非常占用内存。 The script throws a MemoryError if object_x or chunk is too large. 如果object_xchunk太大,脚本将引发MemoryError。

I'm searching for a better way to keep track of the chunks to build the object_x . 我正在寻找一种更好的方法来跟踪构建object_x的块。 Any ideas? 有任何想法吗? I'm using Python, but language doesn't matter I guess. 我使用的是Python,但我猜语言并不重要。

This is the kind of scenario where streaming is very important. 在这种情况下,流非常重要。 Doing everything in memory is a bad idea because you might not have enough memory (as in your case). 在内存中执行所有操作不是一个好主意,因为您可能没有足够的内存(例如您的情况)。 You should probably save the chunks to disk, keep track of how many you downloaded, and when you reach 1000, process them on disk (or load them into memory one by one to process them). 您可能应该将这些块保存到磁盘上,跟踪已下载的块数,并在达到1000个块时在磁盘上进行处理(或将它们逐个加载到内存中进行处理)。

" C# Security: Computing File Hashes " is a recent article I wrote - it's a different subject, but it does illustrate the importance of streaming towards the end. C#安全:计算文件哈希 ”是我最近写的一篇文章-这是一个不同的主题,但确实说明了流式传输到最后的重要性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM