简体   繁体   English

如何优化大内存数据库的分页

[英]How to optimize paging for large in memory database

I have an application where the entire database is implemented in memory using a stl-map for each table in the database. 我有一个应用程序,使用数据库中每个表的stl-map在内存中实现整个数据库。

Each item in the stl-map is a complex object with references to other items in the other stl-maps. stl-map中的每个项都是一个复杂的对象,引用其他stl-maps中的其他项。

The application works with a large amount of data, so it uses more than 500 MByte RAM. 该应用程序使用大量数据,因此它使用超过500 MB的RAM。 Clients are able to contact the application and get a filtered version of the entire database. 客户端可以联系应用程序并获取整个数据库的筛选版本。 This is done by running through the entire database, and finding items relevant for the client. 这是通过遍历整个数据库并查找与客户端相关的项目来完成的。

When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine). 当应用程序运行一个小时左右时,Windows 2003 SP2开始为应用程序分页RAM的部分内容(尽管机器上有16 GB的RAM)。

After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. 部分页面调度应用程序后,客户端登录需要很长时间(10分钟),因为它现在为stl-map中的每个指针查找生成页面错误。 If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM. 如果在此之后第二次运行客户端登录,则它很快(几秒),因为所有内存现在都回到RAM中。

I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory. 我可以看到有可能告诉Windows将内存锁定在RAM中,但这通常仅建议用于设备驱动程序,并且仅用于“小”内存量。

I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM. 我想一个糟糕的勒芒解决方案可能是遍历整个内存数据库,因此告诉Windows我们仍然有兴趣将数据模型保存在RAM中。

I guess another poor mans solution could be to disable the pagefile completely on Windows. 我想另一个糟糕的解决方案可能是在Windows上完全禁用页面文件。

I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. 我想昂贵的解决方案是SQL数据库,然后重写整个应用程序以使用数据库层。 Then hopefully the database system will have implemented means to for fast access. 然后希望数据库系统将实现快速访问的手段。

Are there other more elegant solutions ? 还有其他更优雅的解决方案吗?

This sounds like either a memory leak, or a serious fragmentation problem. 听起来像是内存泄漏或严重的碎片问题。 It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more. 在我看来,第一步是弄清楚是什么导致500 Mb的数据耗尽16 Gb的RAM并仍然需要更多。

Edit: Windows has a working set trimmer that actively attempts to page out idle data. 编辑:Windows有一个工作集修剪器,主动尝试分页空闲数据。 The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). 基本思想是它通过并将页面标记为可用,但将数据保留在其中(并且虚拟内存管理器知道它们中的数据)。 If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out. 但是,如果您在将内存分配给其他用途之前尝试访问该内存,则会将其标记为再次使用,这通常会阻止其被分页。

If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling SetProcessWorkingSetSize . 如果您确实认为这是问题的根源,则可以通过调用SetProcessWorkingSetSize间接控制工作集修剪器。 At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful. 至少根据我的经验,这很少用,但你可能处于其中一个非常有用的特殊情况。

As @Jerry Coffin said, it really sounds like your actual problem is a memory leak. 正如@Jerry Coffin所说,听起来你的实际问题是内存泄漏。 Fix that. 修复它。

But for the record, none of your "poor mans solutions" would work. 但是记录在案,你的“穷人解决方案”都不会奏效。 At all. 完全没有。

Windows pages out some of your data because there's not room for it in RAM . Windows 会将某些数据分页, 因为RAM中没有空间 Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. 循环遍历整个内存数据库将加载数据模型的每个字节,是......这将导致其中的其他部分被分页。 In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out. 最后,您会产生很多页面错误,最终唯一的区别是数据结构的哪些部分被分页。

Disabling the page file? 禁用页面文件? Yes, if you think a hard crash is better than low performance. 是的,如果您认为硬碰撞比低性能更好。 Windows doesn't page data out because it's fun. Windows不会将数据分页,因为它很有趣。 It does that to handle situations where it would otherwise run out of memory. 它可以处理内存不足的情况。 If you disable the pagefile, the app will just crash when it would otherwise page out data. 如果您禁用页面文件,那么应用程序只会在以其他方式分页数据时崩溃。

If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". 如果你的数据集真的很大,它不适合内存,那么我不明白为什么SQL数据库会特别“昂贵”。 Unlike your current solution, databases are optimized for this purpose. 与当前解决方案不同,数据库针对此目的进行了优化。 They're meant to handle datasets too large to fit in memory, and to do this efficiently. 它们旨在处理太大而无法容纳在内存中的数据集,并且有效地执行此操作。

It sounds like you have a memory leak. 听起来你有内存泄漏。 Fixing that would be the elegant, efficient and correct solution. 修复这将是优雅,高效和正确的解决方案。

If you can't do that, then either 如果你不能那样做,那么

  • throw more RAM at the problem (the app ends up using 16GB? Throw 32 or 64GB at it then), or 在问题上投入更多内存(应用程序最终使用16GB?然后再投入32或64GB),或者
  • switch to a format that's optimized for efficient disk access (A SQL database probably) 切换到针对高效磁盘访问进行了优化的格式(可能是SQL数据库)

---- Edit ----编辑

Given snakefoot explanation, the problem is swapping out memory that is not used for a longer period of time and due to this not having the data in memory when needed. 鉴于蛇足的解释,问题是交换掉较长时间内未使用的内存,并且由于在需要时没有内存中的数据。 This is the same as this: 这与此相同:

Can I tell Windows not to swap out a particular processes' memory? 我可以告诉Windows不要换掉特定进程的内存吗?

and VirtualLock function should do its job: 和VirtualLock函数应该完成它的工作:

http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx http://msdn.microsoft.com/en-us/library/aa366895(VS.85).aspx

---- Previous answer ----上一个答案

First of all you need to distinguish between memory leak and memory need problems. 首先,您需要区分内存泄漏和内存需求问题。

If you have a memory leak then it would be bigger effort to convert entire application to SQL than to debug the application. 如果你有内存泄漏,那么将整个应用程序转换为SQL比调试应用程序要花费更多。

SQL cannot be faster then a well designed, domain specific in-memory database and if you have bugs, chances are you will have different ones in an SQL version as well. SQL不能比设计良好,特定于域的内存数据库更快,如果你有bug,你也可能在SQL版本中有不同的。

If this is a memory need problem, then you will need to switch to SQL anyway and this sounds like a good moment. 如果这是一个内存需求问题,那么无论如何你都需要切换到SQL,这听起来像是一个很好的时刻。

We have a similar problem and the solution we choose was to allocate everything in a shared memory block. 我们遇到了类似的问题,我们选择的解决方案是在共享内存块中分配所有内容。 AFAIK, Windows doesn't page this out. AFAIK,Windows不会对此进行分页。 However, using stl-map here is not for faint of heart either and was beyond what we required. 然而,在这里使用stl-map也不适合胆小的人,并且超出了我们的要求。

We are using Boost Shared Memory to implement this for us and it works well. 我们正在使用Boost共享内存为我们实现这一点并且运行良好。 Follow examples closely and you will be up and running quickly. 密切关注示例,您将快速启动并运行。 Boost also has Boost.MultiIndex that will do a lot of what you want. Boost还有Boost.MultiIndex ,可以完成你想要的很多工作。

For a no cost sql solution have you looked at Sqlite ? 对于一个免费的sql解决方案,你看过Sqlite They have an option to run as an in memory database. 他们可以选择在内存数据库中运行。

Good luck, sounds like an interesting application. 祝你好运,听起来像一个有趣的应用程序

I have an application where the entire database is implemented in memory using a stl-map for each table in the database. 我有一个应用程序,使用数据库中每个表的stl-map在内存中实现整个数据库。

That's the start of the end: STL's std::map is extremely memory inefficient. 这是结束的开始:STL的std :: map非常低效。 Same applies to std::list. 同样适用于std :: list。 Every element would be allocated separately causing rather serious memory waste. 每个元素都将单独分配,造成相当严重的内存浪费。 I often use std::vector + sort() + find() instead of std::map in applications where it is possible (more searches than modifications) and I know in advance memory usage might become an issue. 我经常在可能的应用程序中使用std :: vector + sort()+ find()而不是std :: map(搜索次数多于修改次数)并且我事先知道内存使用可能会成为一个问题。

When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine). 当应用程序运行一个小时左右时,Windows 2003 SP2开始为应用程序分页RAM的部分内容(尽管机器上有16 GB的RAM)。

Hard to tell without knowing how your application is written. 很难说不知道你的应用程序是如何编写的。 Windows has the feature to unload from RAM whatever memory of idle applications can be unloaded. Windows具有从RAM卸载的功能,可以卸载任何空闲应用程序的内存。 But that normally affects memory mapped files and alike. 但这通常会影响内存映射文件等。

Otherwise, I would strongly suggest to read up the Windows memory management documentation . 否则,我强烈建议您阅读Windows 内存管理文档 It is not very easy to understand, yet Windows has all sorts and types of memory available to applications. 它不是很容易理解,但Windows具有应用程序可用的各种类型的内存。 I never had luck with it, but probably in your application using custom std::allocator would work. 我从来没有运气,但可能在你的应用程序中使用自定义std :: allocator会起作用。

I can believe it is the fault of flawed pagefile behaviour -i've run my laptops mostly with pagefile turned off since nt4.0. 我可以相信这是有缺陷的页面文件行为的错误 - 我已经运行我的笔记本电脑,大部分页面文件从nt4.0关闭。 In my experience, at least up to XP Pro, Windows intrusively swaps pages out just to provide the dubious benefit of having a really-really-slow extension to the maximum working set space. 根据我的经验,至少在XP专业版中,Windows只是为了提供一个可疑的好处,只是为了提供一个可疑的好处,即对最大工作区空间进行真正非常慢的扩展。

Ask what benefit swapping to harddisk is achieving with 16 Gigabityes of real RAM available? 询问使用16千兆位真实RAM可以实现硬盘交换的好处? If your working set it so big as to need more virtual memory than +10 Gigs, then once swapping is actualy required processes will take anything from a bit longer, to thousands of times longer to complete. 如果你的工作设置得如此之大,以至于需要比+10 Gigs更多的虚拟内存,那么一旦交换实际上是必需的,过程将花费更长的时间,完成数千倍。 On Windows the untameable file system cache seems to antagonise the relationships. 在Windows上,无法访问的文件系统缓存似乎会对抗这些关系。

Now when I (very) occasionaly run out of working set on my XP laptops, there is no traffic jam, the guilty app just crashes. 现在,当我(非常)偶尔用尽我的XP笔记本电脑上的工作集时,没有交通拥堵,这个有罪的应用程序只是崩溃了。 A utility to suspend memory glugging processes before that time and make an alert would be nice, but there is no such thing just a violation, a crash, and sometimes explorer.exe goes down too. 在此之前暂停内存glugging进程并发出警报的实用程序会很好,但是没有这样的东西只是违规,崩溃,有时explorer.exe也会崩溃。

Pagefiles - who needs em' Pagefiles - 谁需要em'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM