简体繁体 English

在进程 memory 中缓存大量数据

[英]Caching huge data in Process memory

原文 2011-07-12 17:59:21 8 3 c++

I am working in Finance Industry.我在金融行业工作。 We want to roll out Database hit for data processing.我们希望推出数据库命中以进行数据处理。 It is very costly.这是非常昂贵的。 So we are planning to go for on-demand cache logic.所以我们计划将 go 用于按需缓存逻辑。 [ runtime insert & runtime lookup ] [运行时插入和运行时查找]

Is anyone worked in implementation of Caching logic for more than 10 million of records?.是否有人为超过 1000 万条记录执行缓存逻辑？ Per record is say about 160 - 200 bytes.每条记录大约 160 - 200 字节。

I faced following disadvantages with different approach.我用不同的方法面临以下缺点。

Can not use stl std::map to implement a key base cache registry.不能使用stl std::map来实现密钥库缓存注册表。 The insert and lookup is very slow after 200000 records.在 200000 条记录之后插入和查找非常慢。
Shared memory or memory mapped files are kind of overhead for caching data, because these data are not shared across the processes共享 memory 或 memory 映射文件是缓存数据的一种开销，因为这些数据不在进程之间共享
Use of sqlite3 in-memory & flatfile application database can be worth.使用sqlite3内存和平面文件应用程序数据库是值得的。 But it too have slow lookup after a 2-3 million of records.但在 2-3 百万条记录之后，它的查找速度也很慢。
Process memory might have some limitation on its own kernel memory consumption.进程 memory 可能对自己的 kernel memory 消耗有一些限制。 my assumption is 2 gig on 32 bit machine & 4 gig on 64 bit machine.我的假设是 32 位机器上的 2 个演出和 64 位机器上的 4 个演出。

Please suggest me something if you had come across this problem and solved by any means.如果您遇到此问题并通过任何方式解决，请给我一些建议。

Thanks谢谢

3 个解决方案

If your cache is a simple key-value store, you should not be using std::map , which has O (log n ) lookup, but std::unordered_map , which has O (1) lookup.如果您的缓存是一个简单的键值存储，则不应使用具有O (log n ) 查找的std::map ，而应使用具有O (1) 查找的std::unordered_map 。 You should only use std::map if you require sorting.如果你需要排序，你应该只使用std::map 。

It sounds like performance is what you're after, so you might want to look at Boost Intrusive .听起来性能就是你所追求的，所以你可能想看看Boost Intrusive 。 You can easily combine unordered_map and list to create a high-efficiency LRU.你可以很容易地结合unordered_map和list来创建一个高效的 LRU。

Read everything into memory and create R&B tree for key access.将所有内容读入 memory 并为密钥访问创建 R&B 树。

http://www.mit.edu/~emin/source_code/cpp_trees/index.html http://www.mit.edu/~emin/source_code/cpp_trees/index.html

In one recent project, we had database with some 10s M records, and were using such strategy.在最近的一个项目中，我们有大约 10 条 M 记录的数据库，并且正在使用这种策略。

Your data weight is 2GB, from your post.从您的帖子来看，您的数据权重为 2GB。 With overhead, it will come up to say double.有了开销，它会说双倍。 It's no problem for any 64bit architecture.任何 64 位架构都没有问题。

I have recently changed the memory allocation of our product (3D medical volume viewer) to use good old memory mapped files.我最近更改了我们产品（3D 医疗卷查看器）的 memory 分配以使用旧的 memory 映射文件。

The advantages were:优点是：

I can allocate all physical RAM if I like (my 32 bit app sometimes needs more than 4 gig on a 64 bit machine)如果我愿意，我可以分配所有物理 RAM（我的 32 位应用程序有时需要在 64 位机器上超过 4 gig）
If you map only portions, your adress space is largely free for your application to use, which improves reliability.如果您只使用 map 部分，则您的地址空间在很大程度上可供您的应用程序使用，从而提高了可靠性。
if you run out of memory, things just slow down, no crashes.如果你用完了 memory，事情就会变慢，不会崩溃。

In my case it was just data (mostly readonly).就我而言，它只是数据（主要是只读的）。 If you have a more complex data structure, this will be more work than using "normal" objects.如果你有一个更复杂的数据结构，这将比使用“普通”对象做更多的工作。

You can actually share these across processes (if they're backed by a real file).您实际上可以跨进程共享这些（如果它们由真实文件支持）。 This may behave differently, I dont have experience with that.这可能表现不同，我没有这方面的经验。