简体   繁体   English

存储大词典限制内存

[英]Store Big Dictionary Restrict Memory

I have an extremely big dictionary that I need to analyze. 我有一本非常大的字典,需要分析。

How the dictionary come to existance? 字典如何存在?

The dictionary is a pivot table of log file. 字典是日志文件的数据透视表。 I have a snapshot of inventory everyday and right now I have the snapshots for the past month. 我每天都有库存的快照,现在有上个月的快照。

each snapshot looks like this: 每个快照如下所示:

2013-01-01 Apple 1000
2013-01-01 Banana 2000
2013-01-01 Orange 3000
....

And then, I group all the records by the product name and plan to do the time series analysis later. 然后,我将所有记录按产品名称分组,并计划稍后进行时间序列分析。 The output I have looks like this: 我的输出看起来像这样:

{
 Apple:[(2013-01-01,1000),(2013-01-02, 998),(2013-01-03,950)...],
 Banana:[(2013-01-01,2000),(2013-01-02, 1852),(2013-01-03, 1232)...]
 Orange....
}

As you know, assuming you have years and years of inventory snapshots and very wide inventory breadth... This dictionary turns out to be huge. 如您所知,假设您有多年的库存快照和非常广泛的库存宽度...这个字典真是很大。 The whole 'GROUPING' process happens in memory and the size of the dictionary exceeds the memory limit. 整个“分组”过程发生在内存中,并且字典的大小超过了内存限制。

I am wondering how to restrict the memory usage to a specific amount(say 5GB and I don't want to disable the server for normal usage) and do the work on the disk. 我想知道如何将内存使用量限制为特定数量(例如5GB,并且我不想为正常使用禁用服务器)并在磁盘上进行工作。

Here is a very similar question to mine but following the 'BEST VOTED' answer, the memory is still quickly eaten up after I change the loop number to a real 'Big data' size. 是一个与我非常相似的问题,但是按照“最佳投票”的答案,在将循环号更改为实际的“大数据”大小后,内存仍然很快被耗尽。

So any example that truly doesn't kill memory would be appreciated and speed is not that import to me. 因此,任何真正不会杀死内存的示例都会受到赞赏,而速度对我而言并不重要。

(Note, there are several ways to optimize the data structure so that the dictionary size could be reduced but... the inventory snapshots are not periodic and some of the products have different number of snapshots so 'MATRIX' idea may not work) (请注意,有多种方法可以优化数据结构,以便减小字典大小,但是...库存快照不是定期的,并且某些产品具有不同数量的快照,因此“ MATRIX”的想法可能行不通)

At this point, I would suggest you stop using a dictionary and import sqlite3 , or you're going to be reinventing the wheel implementing optimizations that databases already have. 在这一点上,我建议您停止使用字典并import sqlite3 ,否则您将重新发明实现数据库已经具有的优化的方法。

To get started quickly, Elixir is a very decent and practical ORM. 为了快速入门, Elixir是一个非常不错且实用的ORM。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM