简体   繁体   English

如何在c或c ++中改进多维位数组比较性能

[英]how to improve multi-dimentional bit array comparison performance in c or c++

I have the following three-dimensional bit array(for a bloom filter): 我有以下三维位数组(用于布隆过滤器):

unsigned char  P_bit_table_[P_ROWS][ROWS][COLUMNS];

在此输入图像描述

the P_ROWS's dimension represents independent two-dimensional bit arrays(ie, P_ROWS[0], P_ROWS 1 ,P_ROWS[2] are independent bit arrays) and could be as large as 100MBs and contains data which are populated independently. P_ROWS的维度表示独立的二维位数组(即P_ROWS [0],P_ROWS 1 ,P_ROWS [2]是独立的位数组),可以大到100MB并包含独立填充的数据。 The data that I am looking for could be in any of these P_ROWS and right now I am searching through it independently, which is P_ROWS[0] then P_ROWS 1 and so on until i get a positive or until the end of it(P_ROWS[n-1]). 我正在寻找的数据可以在任何这些P_ROWS中,现在我正在独立搜索它,即P_ROWS [0]然后是P_ROWS 1 ,依此类推,直到我得到一个正数或直到它结束(P_ROWS [ N-1])。 This implies that if n is 100 I have to do this search(bit comparison) 100 times(and this search is done very often). 这意味着如果n为100,我必须进行100次搜索(比特比较)(这种搜索经常进行)。 Some body suggested that I can improve the search performance if I could do bit grouping (use a column-major order on the row-major order array-- I DON'T KNOW HOW). 有些人建议我可以提高搜索性能,如果我可以进行位分组(在行主要顺序数组上使用列主要顺序 - 我不知道如何)。

I really need to improve the performance of the search because the program does a lot of it. 我真的需要提高搜索的性能,因为该程序做了很多。

I will be happy to give more details of my bit table implementation if required. 如果需要,我将很乐意提供我的位表实现的更多细节。

Sorry for the poor language. 对不起,语言不好。

Thanks for your help. 谢谢你的帮助。

EDIT: The bit grouping could be done in the following format: Assume the array to be : 编辑:位分组可以按以下格式完成:假设数组为:

unsigned char P_bit_table_[P_ROWS][ROWS][COLUMNS]={{(a1,a2,a3),(b1,b2,b3),(c1,c2,c3))},
                                                  {(a1,a2,a3),(b1,b2,b3),(c1,c2,c3))},   
                                                  {(a1,a2,a3),(b1,b2,b3),(c1,c2,c3))}};

As you can see all the rows --on the third dimension-- have similar data. 正如您所看到的,所有行 - 在第三维 - 都有类似的数据。 What I want after the grouping is like; 分组后我想要的是什么; all the a1's are in one group(as just one entity so that i can compare them with another bit for checking if they are on or off ) and all the b1's are in another group and so on. 所有a1都在一个组中(因为我只能将它们与另一个位进行比较,以便检查它们是打开还是关闭)并且所有b1都在另一个组中,依此类推。

Re-use Other People's Algorithms 重用其他人的算法

There are a ton of bit-calculation optimizations out there including many that are non-obvious, like Hamming Weights and specialized algorithms for finding the next true or false bit, that are rather independent of how you structure your data. 有大量的位计算优化,其中包括许多非显而易见的优点,如Hamming Weights和用于查找下一个真或假位的专用算法,它们与您构建数据的方式无关。

Reusing algorithms that other people have written can really speed up computation and lookups, not to mention development time. 重用其他人编写的算法可以真正加速计算和查找,更不用说开发时间了。 Some algorithms are so specialized and use computational magic that will have you scratching your head: in that case, you can take the author's word for it ( after you confirm their correctness with unit tests ). 有些算法是如此专业化并使用计算魔法让你挠头:在这种情况下,你可以用作者的话来说明( 在你用单元测试确认它们的正确性之后 )。

Take Advantage of CPU Caching and Multithreading 利用CPU缓存和多线程

I personally reduce my multidimensional bit arrays to one dimension, optimized for expected traversal. 我个人将我的多维位数组减少到一个维度,针对预期的遍历进行了优化。

This way, there is a greater chance of hitting the CPU cache. 这样,更有可能击中CPU缓存。

In your case, I would also think deeply about the mutability of the data and whether you want to put locks on blocks of bits. 在您的情况下,我还会深入思考数据的可变性以及是否要将锁定置于位块上。 With 100MBs of data, you have the potential of running your algorithms in parallel using many threads, if you can structure your data and algorithms to avoid contention. 使用100MB的数据,如果您可以构建数据和算法以避免争用,则可以使用多个线程并行运行算法。

You may even have a lockless model if you divide up ownership of the blocks of data by thread so no two threads can read or write to the same block. 如果按线程划分数据块的所有权,那么甚至可能有一个无锁模型,因此没有两个线程可以读取或写入同一个块。 It all depends on your requirements. 这一切都取决于您的要求。

Now is a good time to think about these issues. 现在是思考这些问题的好时机。 But since no one knows your data and usage better than you do, you must consider design options in the context of your data and usage patterns. 但由于没有人比您更了解您的数据和使用情况,因此您必须在数据和使用模式的上下文中考虑设计选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM