简体   繁体   English

前往:处理过多内存应用程序的最佳方法? Mmap,内存还是缓存?

[英]Go: Best way to handle excessive memory application? Mmap, memory or caching?

I have a Go application which requires around 600GB of memory. 我有一个Go应用程序,它需要大约600GB的内存。 The machine on which is will run has 128GB of RAM. 将在其上运行的计算机具有128GB的RAM。 I'm trying to decide how best to handle this. 我正在尝试决定如何最好地处理此问题。

The options are: 选项包括:

  1. Just load everything into the memory (pretend like I have 600GB RAM) and let the OS page out the infrequently accessed part of the memory into virtual memory. 只需将所有内容加载到内存中(就好像我有600GB RAM),然后让OS将不经常访问的内存部分调出到虚拟内存中即可。 I like this idea because I don't have to do anything special in the code, the OS will just handle everything. 我喜欢这个主意,因为我不必在代码中做任何特别的事情,操作系统将处理所有事情。 However, I'm not sure this is a good idea. 但是,我不确定这是一个好主意。

  2. Have the data stored on disk and use mmap (memory mapped file) which I guess is similar to the above but will require a lot more coding. 将数据存储在磁盘上并使用mmap(内存映射文件),我想它与上面的类似,但是需要更多的编码。 Also it appears to mean that the data will have to be stored as []byte and then parsed every time I need to use it, rather that being already in whatever type I need it for the actual calculations. 同样,这似乎意味着必须将数据存储为[]byte ,然后在每次需要使用时进行解析,而不是已经存在任何实际计算所需的类型。

  3. Build a caching system in which the data is kept on HDD and then loaded it when it's needed, with the most frequently accessed data being held in memory and the least frequently accessed data being purged whenever the memory limited is exceeded. 构建一个缓存系统,将数据保留在HDD上,然后在需要时将其加载,并在内存超出限制时将访问频率最高的数据保留在内存中,并清除访问频率最低的数据。

What are the advantages and disadvantages with these? 这些有什么优点和缺点? I'd prefer to go with (1) if possible due to its simplicity... is there anything wrong with that? 由于其简单性,我宁愿使用(1)……这有什么问题吗?

It all depends on the nature of the data access. 这完全取决于数据访问的性质。 Will the accesses to those 600GB be uniformly distributed? 对这600GB的访问是否会均匀分配? If that's not the case then a solution where you cache part of your content in memory and keep the rest of it on the HDD will likely be sufficient since you have enough RAM to cache more than 20% of your data. 如果不是这种情况,那么一种解决方案是将部分内容缓存在内存中,然后将其余内容保留在HDD上就足够了,因为您有足够的RAM来缓存20%以上的数据。 Keeping everything in virtual memory space may come with surprising drawbacks such as the need for a huge swap partition. 将所有内容保留在虚拟内存空间中可能会带来令人惊讶的缺点,例如需要巨大的交换分区。

To cache the data on disk you could use a DB engine as Dave suggests since they usually do a good job of caching the most frequently accessed content. 要在磁盘上缓存数据,可以使用Dave建议的DB引擎,因为它们通常可以很好地缓存最常访问的内容。 You could also use memcached , a library and client for caching stuff in memory. 您还可以使用memcached ,一个库和客户端在内存中缓存内容。

The bottom line is that optimizing performance without knowing the exact usage patterns is hard. 最重要的是,很难在不知道确切使用模式的情况下优化性能。 Luckily, with Go you don't have to guess. 幸运的是,有了Go,您不必猜测。 You can test and measure. 您可以测试和测量。

You can define an interface similar to 您可以定义类似于以下内容的接口

type Index interface{
    Lookup(query string) Result
}

And then try all of your solutions, starting with the easiest to implement. 然后,从最容易实现的角度开始,尝试所有解决方案。

type inMemoryIndex struct {...}

func (*inMemoryIndex) Lookup(query string) Result {...}

type memcachedIndex struct {...}

type dbIndex struct {...}

Then you can use Go's builtin benchmarking tools to benchmark your application and see if it lives up to your standards. 然后,您可以使用Go的内置基准测试工具对您的应用程序进行基准测试,看看它是否符合您的标准。 You can even benchmark on that machine, using real data and mocked user queries. 您甚至可以使用真实数据和模拟的用户查询在该计算机上进行基准测试。

You're correct to assume that mmap would require more coding so I would have saved that until I had tried all other options. 您正确地认为mmap需要更多的编码,因此在尝试了所有其他选项之前,我会保存下来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM