简体   繁体   English

有人能解释一下这个GetCardinality方法在做什么吗?

[英]Can someone explain to me what this GetCardinality method is doing?

I've been looking into faceted search with Lucene.NET, I've found a brilliant example here which explains a fair amount, apart from the fact that it completely overlooks the function which checks the cardinality of items in a bit array. 我一直在寻找Lucene.NET的分面搜索,我在这里找到了一个很好的例子它解释了相当多的事实,除了它完全忽略了检查位数组中项目基数的功能。

Can anyone give me a run down of what it is doing? 任何人都可以告诉我它正在做什么吗? The main things I don't understand is why the bitsSetArray is created as it is, what it is used for and how all the if statements work in the for loop. 我不理解的主要问题是为什么bitsSetArray按原样创建,它用于什么以及所有if语句如何在for循环中工作。

This may be a big ask but I have to understand how this works before I can even think of using it in my own code. 这可能是一个很大的问题,但我必须先了解它是如何工作的,甚至可以考虑在我自己的代码中使用它。

Thanks 谢谢

public static int GetCardinality(BitArray bitArray)
    {
        var _bitsSetArray256 = new byte[] {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};
        var array = (uint[])bitArray.GetType().GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bitArray);
        int count = 0;

        for (int index = 0; index < array.Length; index ++)
            count += _bitsSetArray256[array[index] & 0xFF] + _bitsSetArray256[(array[index] >> 8) & 0xFF] + _bitsSetArray256[(array[index] >> 16) & 0xFF] + _bitsSetArray256[(array[index] >> 24) & 0xFF];

        return count;
    }

The _bitsSetArray256 array is initialised with values such that _bitsSetArray256[n] contains the number of bits set in the binary representation of n , for n in 0..255 . 所述_bitsSetArray256阵列与值进行初始化,使得_bitsSetArray256[n]含有以二进制表示设置的位的数目n ,对于n0..255

For example, _bitsSetArray256[13] equals 3, because 13 in binary is 1101 which contains 3 1 s. 例如, _bitsSetArray256[13]等于3,因为二进制中的13是1101 ,其中包含3 1 s。

The reason for doing this is that it's far faster to pre-compute these values and store them, rather than having to work them out each time (or on-demand). 这样做的原因是预先计算这些值并存储它们要快得多,而不是每次(或按需)必须解决它们。 It's not like the number of 1 s in the binary representation of 13 is ever going to change, after all :) 它不像13的二进制表示中的1的数量永远会改变,毕竟:)

Within the for loop, we are looping through an array of uint s. for循环中,我们循环遍历一个uint数组。 AC# uint is a 32-bit quantity, ie made up for 4 bytes. AC# uint是32位数量,即由4个字节组成。 Our lookup table tells us how many bits are set in a byte, so we must process each of the four bytes. 我们的查找表告诉我们在一个字节中设置了多少位,因此我们必须处理四个字节中的每一个。 The bit manipulation in the count += line extracts each of the four bytes, then gets its bit count from the lookup array. count +=行中的位操作提取四个字节中的每一个,然后从查找数组中获取其位数。 Adding together the bit counts for all four bytes gives the bit count for the uint as a whole. 将所有四个字节的位计数加在一起可以得到整个uint的位数。

So given a BitArray , this function digs into the uint[] m_array member, then returns the total number of bits set in the binary representation of the uint s therein. 因此,给定BitArray ,此函数将深入uint[] m_array成员,然后返回其中uint s的二进制表示中设置的总位数。

I just wanted to post a helpful article about bitArrays for those of us who are developing our own versions of Faceting with Lucene.net. 我只想发布一篇关于bitArrays的有用文章给我们这些正在开发我们自己版本的Faceting with Lucene.net的人。 See: http://dotnetperls.com/precomputed-bitcount 请参阅: http//dotnetperls.com/precomputed-bitcount

This is a good explination on the fastet way to get the cardinality of the on bits in an integer ( which is a bulk of what the above code sample does ). 这是一种很好的探索方法,可以获得整数位的基数(这是上面代码示例所做的大部分)。

Imlementing the method in the article in my faceted search and some other simple changes i was able to cut the time it took the get the count by ~ 65%. 通过我的分面搜索和其他一些简单的更改,文章中的方法变得非常简单,我能够将计算所需的时间减少约65%。 The differences where in: 差异在于:

  1. Declaring the _bitcount global ( so its not created per call ) 声明_bitcount全局(因此它不是每次调用创建的)
  2. Changing the for to foreach (ANT Profiler showed a 25% gain here) 将for改为foreach(ANT Profiler在这里显示了25%的增长)
  3. Implementening the 65535 table vs the 256 to shift 16 bits at a time rather then 8. 实现65535表与256表一次移位16位而不是8位。

     private static int[] _bitcounts = InitializeBitcounts(); private static int GetCardinality(BitArray bitArray) { uint[] array = (uint[])bitArray.GetType().GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bitArray); int count = 0; foreach (uint value in array) { count += _bitcounts[value & 65535] + _bitcounts[(value >> 16) & 65535]; } return count; } private static int[] InitializeBitcounts() { int[] bitcounts = new int[65536]; int position1 = -1; int position2 = -1; // // Loop through all the elements and assign them. // for (int i = 1; i < 65536; i++, position1++) { // // Adjust the positions we read from. // if (position1 == position2) { position1 = 0; position2 = i; } bitcounts[i] = bitcounts[position1] + 1; } return bitcounts; } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM