[英]Fastest way of counting occurences of 1's in multiple std::bitset<N>?
I wanna count the occurences of 1
in multiple bitsets at same position. 我想在同一位置的多个位集中计数
1
出现。 The count of each position is stored in a vector. 每个位置的计数都存储在向量中。
Eg 例如
b0 = 1011
b1 = 1110
b2 = 0110
----
c = 2231 (1+1+0,0+1+1,1+1+1,1+0+0)
I could do that easily with code below, but this code seems to lack of performance, but I'm not sure. 我可以使用下面的代码轻松地做到这一点,但是这段代码似乎缺乏性能,但是我不确定。 So my question is easily: Is there a faster way to count the
1
? 所以我的问题很容易:是否有一种更快的方法来计算
1
?
#include <bitset>
#include <vector>
#include <iostream>
#include <string>
int main(int argc, char ** argv)
{
std::vector<std::bitset<4>> bitsets;
bitsets.push_back(std::bitset<4>("1011"));
bitsets.push_back(std::bitset<4>("1110"));
bitsets.push_back(std::bitset<4>("0110"));
std::vector<unsigned> counts;
for (int i=0,j=4; i<j; ++i)
{
counts.push_back(0);
for (int p=0,q=bitsets.size(); p<q; ++p)
{
if (bitsets[p][(4-1)-i]) // reverse order
{
counts[i] += 1;
}
}
}
for (auto const & count: counts)
{
std::cout << count << " ";
}
}
for (int i=0,j=4; i<j; ++i)
{
for (int p=0,q=b.size(); p<q; ++p)
{
if(b[p][i])
{
c[p] += 1;
}
}
}
A table-driven approach. 表驱动的方法。 It obviously has its limits*, but depending on the application could prove quite suitable:
它显然有其局限性*,但取决于应用程序可能证明是非常合适的:
#include <array>
#include <bitset>
#include <string>
#include <iostream>
#include <cstdint>
static const uint32_t expand[] = {
0x00000000,
0x00000001,
0x00000100,
0x00000101,
0x00010000,
0x00010001,
0x00010100,
0x00010101,
0x01000000,
0x01000001,
0x01000100,
0x01000101,
0x01010000,
0x01010001,
0x01010100,
0x01010101
};
int main(int argc, char* argv[])
{
std::array<std::bitset<4>, 3> bits = {
std::bitset<4>("1011"),
std::bitset<4>("1110"),
std::bitset<4>("0110")
};
uint32_t totals = 0;
for (auto& x : bits)
{
totals += expand[x.to_ulong()];
}
std::cout << ((totals >> 24) & 0xff) << ((totals >> 16) & 0xff) << ((totals >> 8) & 0xff) << ((totals >> 0) & 0xff) << std::
endl;
return 0;
}
Edit:: * Actually, it's less limited than one might think... 编辑:: *实际上,它比人们想象的要少...
I would personnaly transpose the way your order your bits. 我会亲自处理您订购食物的方式。
1011 110
1110 becomes 011
0110 111
100
Two main reasons : you can use stl algorithms and can have data locality for performance when you work on bigger size. 两个主要原因:可以使用stl算法,并且在处理更大的数据时可以具有数据局部性来提高性能。
#include <bitset>
#include <vector>
#include <iostream>
#include <string>
#include <iterator>
int main()
{
std::vector<std::bitset<3>> bitsets_transpose;
bitsets_transpose.reserve(4);
bitsets_transpose.emplace_back(std::bitset<3>("110"));
bitsets_transpose.emplace_back(std::bitset<3>("011"));
bitsets_transpose.emplace_back(std::bitset<3>("111"));
bitsets_transpose.emplace_back(std::bitset<3>("100"));
std::vector<size_t> counts;
counts.reserve(4);
for (auto &el : bitsets_transpose) {
counts.emplace_back(el.count()); // use bitset::count()
}
// print counts result
std::copy(counts.begin(), counts.end(), std::ostream_iterator<size_t>(std::cout, " "));
}
Output is 输出是
2 2 3 1
2 2 3 1
Refactoring to separate counting logic from vector management allows us to inspect the efficiency of the counting algorithm: 重构以将计数逻辑与矢量管理分开,使我们可以检查计数算法的效率:
#include <bitset>
#include <vector>
#include <iostream>
#include <string>
#include <iterator>
__attribute__((noinline))
void count(std::vector<unsigned> counts,
const std::vector<std::bitset<4>>& bitsets)
{
for (int i=0,j=4; i<j; ++i)
{
for (int p=0,q=bitsets.size(); p<q; ++p)
{
if (bitsets[p][(4-1)-i]) // reverse order
{
counts[i] += 1;
}
}
}
}
int main(int argc, char ** argv)
{
std::vector<std::bitset<4>> bitsets;
bitsets.push_back(std::bitset<4>("1011"));
bitsets.push_back(std::bitset<4>("1110"));
bitsets.push_back(std::bitset<4>("0110"));
std::vector<unsigned> counts(bitsets.size(), 0);
count(counts, bitsets);
for (auto const & count: counts)
{
std::cout << count << " ";
}
}
gcc5.3 with -O2 yields this: 带-O2的gcc5.3产生以下结果:
count(std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<std::bitset<4ul>, std::allocator<std::bitset<4ul> > > const&):
movq (%rsi), %r8
xorl %r9d, %r9d
movl $3, %r10d
movl $1, %r11d
movq 8(%rsi), %rcx
subq %r8, %rcx
shrq $3, %rcx
.L4:
shlx %r10, %r11, %rsi
xorl %eax, %eax
testl %ecx, %ecx
jle .L6
.L10:
testq %rsi, (%r8,%rax,8)
je .L5
movq %r9, %rdx
addq (%rdi), %rdx
addl $1, (%rdx)
.L5:
addq $1, %rax
cmpl %eax, %ecx
jg .L10
.L6:
addq $4, %r9
subl $1, %r10d
cmpq $16, %r9
jne .L4
ret
Which does not seem at all inefficient to me. 对我来说,这似乎一点也不低效。
There are redundant memory reallocations and some other code in your program. 程序中有多余的内存重新分配和一些其他代码。 For example before using method
push_back
you could at first reserve enough memory in the vector. 例如,在使用方法
push_back
之前,您可以首先在向量中保留足够的内存。
The program could look the following way. 该程序可能如下所示。
#include <iostream>
#include <bitset>
#include <vector>
const size_t N = 4;
int main()
{
std::vector<std::bitset<N>> bitsets =
{
std::bitset<N>( "1011" ),
std::bitset<N>( "1110" ),
std::bitset<N>( "0110" )
};
std::vector<unsigned int> counts( N );
for ( const auto &b : bitsets )
{
for ( size_t i = 0; i < N; i++ ) counts[i] += b[N - i -1];
}
for ( unsigned int val : counts ) std::cout << val;
std::cout << std::endl;
return 0;
}
Its output is 它的输出是
2231
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.