[英]Can someone explain to me what this GetCardinality method is doing?
I've been looking into faceted search with Lucene.NET, I've found a brilliant example here which explains a fair amount, apart from the fact that it completely overlooks the function which checks the cardinality of items in a bit array. 我一直在寻找Lucene.NET的分面搜索,我在这里找到了一个很好的例子,它解释了相当多的事实,除了它完全忽略了检查位数组中项目基数的功能。
Can anyone give me a run down of what it is doing? 任何人都可以告诉我它正在做什么吗? The main things I don't understand is why the bitsSetArray is created as it is, what it is used for and how all the if statements work in the for loop. 我不理解的主要问题是为什么bitsSetArray按原样创建,它用于什么以及所有if语句如何在for循环中工作。
This may be a big ask but I have to understand how this works before I can even think of using it in my own code. 这可能是一个很大的问题,但我必须先了解它是如何工作的,甚至可以考虑在我自己的代码中使用它。
Thanks 谢谢
public static int GetCardinality(BitArray bitArray)
{
var _bitsSetArray256 = new byte[] {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};
var array = (uint[])bitArray.GetType().GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bitArray);
int count = 0;
for (int index = 0; index < array.Length; index ++)
count += _bitsSetArray256[array[index] & 0xFF] + _bitsSetArray256[(array[index] >> 8) & 0xFF] + _bitsSetArray256[(array[index] >> 16) & 0xFF] + _bitsSetArray256[(array[index] >> 24) & 0xFF];
return count;
}
The _bitsSetArray256
array is initialised with values such that _bitsSetArray256[n]
contains the number of bits set in the binary representation of n
, for n
in 0..255
. 所述_bitsSetArray256
阵列与值进行初始化,使得_bitsSetArray256[n]
含有以二进制表示设置的位的数目n
,对于n
在0..255
。
For example, _bitsSetArray256[13]
equals 3, because 13 in binary is 1101
which contains 3 1
s. 例如, _bitsSetArray256[13]
等于3,因为二进制中的13是1101
,其中包含3 1
s。
The reason for doing this is that it's far faster to pre-compute these values and store them, rather than having to work them out each time (or on-demand). 这样做的原因是预先计算这些值并存储它们要快得多,而不是每次(或按需)必须解决它们。 It's not like the number of 1
s in the binary representation of 13 is ever going to change, after all :) 它不像13的二进制表示中的1
的数量永远会改变,毕竟:)
Within the for
loop, we are looping through an array of uint
s. 在for
循环中,我们循环遍历一个uint
数组。 AC# uint
is a 32-bit quantity, ie made up for 4 bytes. AC# uint
是32位数量,即由4个字节组成。 Our lookup table tells us how many bits are set in a byte, so we must process each of the four bytes. 我们的查找表告诉我们在一个字节中设置了多少位,因此我们必须处理四个字节中的每一个。 The bit manipulation in the count +=
line extracts each of the four bytes, then gets its bit count from the lookup array. count +=
行中的位操作提取四个字节中的每一个,然后从查找数组中获取其位数。 Adding together the bit counts for all four bytes gives the bit count for the uint
as a whole. 将所有四个字节的位计数加在一起可以得到整个uint
的位数。
So given a BitArray
, this function digs into the uint[] m_array
member, then returns the total number of bits set in the binary representation of the uint
s therein. 因此,给定BitArray
,此函数将深入uint[] m_array
成员,然后返回其中uint
s的二进制表示中设置的总位数。
I just wanted to post a helpful article about bitArrays for those of us who are developing our own versions of Faceting with Lucene.net. 我只想发布一篇关于bitArrays的有用文章给我们这些正在开发我们自己版本的Faceting with Lucene.net的人。 See: http://dotnetperls.com/precomputed-bitcount 请参阅: http : //dotnetperls.com/precomputed-bitcount
This is a good explination on the fastet way to get the cardinality of the on bits in an integer ( which is a bulk of what the above code sample does ). 这是一种很好的探索方法,可以获得整数位的基数(这是上面代码示例所做的大部分)。
Imlementing the method in the article in my faceted search and some other simple changes i was able to cut the time it took the get the count by ~ 65%. 通过我的分面搜索和其他一些简单的更改,文章中的方法变得非常简单,我能够将计算所需的时间减少约65%。 The differences where in: 差异在于:
Implementening the 65535 table vs the 256 to shift 16 bits at a time rather then 8. 实现65535表与256表一次移位16位而不是8位。
private static int[] _bitcounts = InitializeBitcounts(); private static int GetCardinality(BitArray bitArray) { uint[] array = (uint[])bitArray.GetType().GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(bitArray); int count = 0; foreach (uint value in array) { count += _bitcounts[value & 65535] + _bitcounts[(value >> 16) & 65535]; } return count; } private static int[] InitializeBitcounts() { int[] bitcounts = new int[65536]; int position1 = -1; int position2 = -1; // // Loop through all the elements and assign them. // for (int i = 1; i < 65536; i++, position1++) { // // Adjust the positions we read from. // if (position1 == position2) { position1 = 0; position2 = i; } bitcounts[i] = bitcounts[position1] + 1; } return bitcounts; }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.