简体繁体 English

哪种数据结构最适合共享内存方案和快速查找

[英]Which data structure works best in shared memory scenario and fast lookup

原文 2015-05-25 11:00:50 4 2 c/ dll/ data-structures/ shared-libraries/ shared-memory

I am still at a conceptual stage of a project. 我仍处于项目的概念阶段。 Yet to start code implementation. 尚未开始执行代码。 A subtask is this : 子任务是这样的：

2 Processes will request data from a commonly accessed DLL. 2进程将从一个通常访问的DLL中请求数据。 This DLL would be storing this data in a buffer in memory. 该DLL会将这些数据存储在内存中的缓冲区中。 If I just instantiate a structure within the DLL and store data in it, then each process instance will have a seperate structure and the data won't be common. 如果仅在DLL中实例化一个结构并将数据存储在其中，则每个流程实例将具有单独的结构，并且数据将不常见。 So I need to have a shared memory implementation. 因此，我需要共享内存实现。 Now another requirement that I have is of fast lookup time within the data. 现在，我还有另一个要求，就是要在数据中快速查找时间。 I am not sure how an AVL tree can be stored within a shared memory space. 我不确定如何将AVL树存储在共享内存空间中。 Is there an implementation available on the internet for an AVL tree/Hashmap that can be stored in shared memory space ? Internet上是否存在可存储在共享内存空间中的AVL树/哈希映射的实现？ Also, is this the right approach to the problem ? 另外，这是解决问题的正确方法吗？ Or should I be using something else altogether ? 还是我应该一起使用其他东西？

TIA! TIA！

2 个解决方案

Whether this is the right approach depends on various factors, such as how expensive the data is to produce, whether the processes need to communicate with each other concerning the data, and so on. 这是否是正确的方法取决于各种因素，例如，数据的生产成本如何，流程是否需要就数据相互通信等等。 The rest of this answer assumes that you really do need a lookup structure in shared memory. 该答案的其余部分假定您确实确实需要共享内存中的查找结构。

You can use any data structure, provided that you can allocate storage for both your data and the data structure's internals in your shared memory space. 您可以使用任何数据结构，只要您可以在共享内存空间中为数据和数据结构的内部分配存储空间。 This typically means that you won't be able to use malloc for it, since each process' heap usually remains private. 这通常意味着您将无法使用malloc ，因为每个进程的堆通常保持私有状态。 You will need your own custom allocator. 您将需要自己的自定义分配器。

Let's say you chose AVL trees. 假设您选择了AVL树。 Here's a library that implements them: https://github.com/fbuihuu/libtree . 这是一个实现它们的库： https : //github.com/fbuihuu/libtree 。 It looks like in this library, the "internal" AVL node data is stored intrusively in your "objects." 看起来在该库中，“内部” AVL节点数据被侵入式存储在“对象”中。 Intrusive means that you reserve fields to be used by the library when declaring your object struct . 侵入式意味着您在声明对象struct时保留要由库使用的字段。 So, as long as you allocate space for your objects in shared memory, using your custom allocator, and also allocate space for the root tree struct there as well, the whole tree should be accessible to multiple processes. 因此，只要使用自定义分配器为共享内存中的对象分配空间，并且还为根树struct分配空间，整个树就应该可以被多个进程访问。 You just have to make sure that the shared memory itself is mapped to the same address range in each process. 您只需要确保共享内存本身在每个进程中都映射到相同的地址范围即可。

If you used a non-intrusive AVL implementation, meaning that each node is represented by an internal struct which then points to a separate struct containing your data, the library or your implementation would have to allow you to specify the allocator for the internal struct somehow, so that you could make sure the space will be allocated in shared memory. 如果您使用非侵入式AVL实现，这意味着每个节点都由一个内部struct表示，然后指向一个包含数据的单独struct ，该库或您的实现将必须允许您以某种方式指定内部struct的分配器，这样您可以确保在共享内存中分配空间。

As for how to write the custom allocator, that really depends on your usage and the system. 至于如何编写自定义分配器，这实际上取决于您的用法和系统。 You need to consider if you will ever need to "resize" the shared memory region, whether the system allows you to do that, whether you will allocate only fixed-width blocks inside the region, or you need to support blocks with arbitrary length, whether it's acceptable to spread your data structures over multiple shared memory regions, how your processes can synchronize and communicate, and so on. 您需要考虑是否需要“调整”共享内存区域的大小，系统是否允许您这样做，是否仅在该区域内分配固定宽度的块，还是需要支持任意长度的块，是否可以将数据结构分布在多个共享内存区域中，是否可以同步和通信，等等。 If you go this route, you should ask a new question on the topic. 如果您走这条路线，则应就该主题提出一个新问题。 Be sure to mention what system you are using (Windows?) and what your constraints are. 请务必提及您正在使用的系统（Windows？）以及您的限制。

EDIT 编辑

Just to further discourage you from doing this unless it's necessary: if, for example, your data is expensive to produce but you don't care whether the processes build up their own independent lookup structures once the data is available to them, then you can, for example, have the DLL write the data to a simple ring buffer in shared memory, and the rest of the code take it from there. 只是为了进一步阻止您执行此操作（除非有必要）：例如，如果您的数据制作成本很高，但是您不关心一旦数据可用，进程是否会建立自己的独立查找结构，那么您可以，例如，让DLL将数据写入共享内存中的简单环形缓冲区，其余代码从那里获取。 Building up two AVL trees isn't really a problem unless they are going to be very large. 除非要建立两个AVL树，否则它们并不是真正的问题。

Also, if you only care about concurrency, and it's not important for there to be two processes, you may be able to make them both threads of one process. 另外，如果您只关心并发，并且有两个进程并不重要，则可以使它们成为一个进程的两个线程。

In the case of Windows, Microsoft's recommended functions return what can be different pointer values to shared memory for each process. 对于Windows，Microsoft推荐的函数将每个进程的不同指针值返回到共享内存。 This means that within the shared memory, offsets (from the start of shared memory) have to be used instead of pointers. 这意味着在共享内存中，必须使用偏移量（从共享内存的起始位置开始）而不是指针。 For example in a linked list, there is a next offset instead of a next pointer. 例如，在链接列表中，存在下一个偏移量而不是下一个指针。 You may want to create macros to convert offsets to pointers, and pointers to offsets. 您可能需要创建宏以将偏移量转换为指针，并将指针转换为偏移量。