计算多个std :: bitset中1的出现的最快方法 <N> ？

Question

I wanna count the occurences of 1 in multiple bitsets at same position. 我想在同一位置的多个位集中计数1出现。 The count of each position is stored in a vector. 每个位置的计数都存储在向量中。

Eg 例如

b0 = 1011
b1 = 1110
b2 = 0110
     ----
 c = 2231 (1+1+0,0+1+1,1+1+1,1+0+0)

I could do that easily with code below, but this code seems to lack of performance, but I'm not sure. 我可以使用下面的代码轻松地做到这一点，但是这段代码似乎缺乏性能，但是我不确定。 So my question is easily: Is there a faster way to count the 1 ? 所以我的问题很容易：是否有一种更快的方法来计算1 ？

#include <bitset>
#include <vector>
#include <iostream>
#include <string>

int main(int argc, char ** argv)
{
  std::vector<std::bitset<4>> bitsets;
  bitsets.push_back(std::bitset<4>("1011"));
  bitsets.push_back(std::bitset<4>("1110"));
  bitsets.push_back(std::bitset<4>("0110"));

  std::vector<unsigned> counts;

  for (int i=0,j=4; i<j; ++i)
  {
    counts.push_back(0);
    for (int p=0,q=bitsets.size(); p<q; ++p)
    {
      if (bitsets[p][(4-1)-i]) // reverse order
      {
        counts[i] += 1;
      }
    }
  }

  for (auto const & count: counts)
  {
      std::cout << count << " ";
  }
}

for (int i=0,j=4; i<j; ++i)
{
  for (int p=0,q=b.size(); p<q; ++p)
  {
    if(b[p][i])
    {
      c[p] += 1;
    }
  }
}

Answer 1

A table-driven approach. 表驱动的方法。 It obviously has its limits*, but depending on the application could prove quite suitable: 它显然有其局限性*，但取决于应用程序可能证明是非常合适的：

#include <array>
#include <bitset>
#include <string>
#include <iostream>
#include <cstdint>

static const uint32_t expand[] = {
        0x00000000,
        0x00000001,
        0x00000100,
        0x00000101,
        0x00010000,
        0x00010001,
        0x00010100,
        0x00010101,
        0x01000000,
        0x01000001,
        0x01000100,
        0x01000101,
        0x01010000,
        0x01010001,
        0x01010100,
        0x01010101
};

int main(int argc, char* argv[])
{
        std::array<std::bitset<4>, 3> bits = {
            std::bitset<4>("1011"),
            std::bitset<4>("1110"),
            std::bitset<4>("0110")
        };

        uint32_t totals = 0;

        for (auto& x : bits)
        {
                totals += expand[x.to_ulong()];
        }

        std::cout << ((totals >> 24) & 0xff) << ((totals >> 16) & 0xff) << ((totals >> 8) & 0xff) << ((totals >> 0) & 0xff) << std::
endl;
        return 0;
}

Edit:: * Actually, it's less limited than one might think... 编辑：： *实际上，它比人们想象的要少...

Answer 2

I would personnaly transpose the way your order your bits. 我会亲自处理您订购食物的方式。

1011              110
1110    becomes   011
0110              111
                  100

Two main reasons : you can use stl algorithms and can have data locality for performance when you work on bigger size. 两个主要原因：可以使用stl算法，并且在处理更大的数据时可以具有数据局部性来提高性能。

#include <bitset>
#include <vector>
#include <iostream>
#include <string>
#include <iterator>

int main()
{
    std::vector<std::bitset<3>> bitsets_transpose;  
    bitsets_transpose.reserve(4);
    bitsets_transpose.emplace_back(std::bitset<3>("110"));
    bitsets_transpose.emplace_back(std::bitset<3>("011"));
    bitsets_transpose.emplace_back(std::bitset<3>("111"));
    bitsets_transpose.emplace_back(std::bitset<3>("100"));

    std::vector<size_t> counts;
    counts.reserve(4);
    for (auto &el : bitsets_transpose) {
        counts.emplace_back(el.count()); // use bitset::count()
    }

    // print counts result
    std::copy(counts.begin(), counts.end(), std::ostream_iterator<size_t>(std::cout, " "));
}

Live code 现场代码

Output is 输出是

2 2 3 1 2 2 3 1

Answer 3

Refactoring to separate counting logic from vector management allows us to inspect the efficiency of the counting algorithm: 重构以将计数逻辑与矢量管理分开，使我们可以检查计数算法的效率：

#include <bitset>
#include <vector>
#include <iostream>
#include <string>
#include <iterator>

__attribute__((noinline))
void count(std::vector<unsigned> counts, 
           const std::vector<std::bitset<4>>& bitsets)
{
  for (int i=0,j=4; i<j; ++i)
  {
    for (int p=0,q=bitsets.size(); p<q; ++p)
    {
      if (bitsets[p][(4-1)-i]) // reverse order
      {
        counts[i] += 1;
      }
    }
  }
}

int main(int argc, char ** argv)
{
  std::vector<std::bitset<4>> bitsets;
  bitsets.push_back(std::bitset<4>("1011"));
  bitsets.push_back(std::bitset<4>("1110"));
  bitsets.push_back(std::bitset<4>("0110"));

  std::vector<unsigned> counts(bitsets.size(), 0);

  count(counts, bitsets);

  for (auto const & count: counts)
  {
      std::cout << count << " ";
  }
}

gcc5.3 with -O2 yields this: 带-O2的gcc5.3产生以下结果：

count(std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<std::bitset<4ul>, std::allocator<std::bitset<4ul> > > const&):
        movq    (%rsi), %r8
        xorl    %r9d, %r9d
        movl    $3, %r10d
        movl    $1, %r11d
        movq    8(%rsi), %rcx
        subq    %r8, %rcx
        shrq    $3, %rcx
.L4:
        shlx    %r10, %r11, %rsi
        xorl    %eax, %eax
        testl   %ecx, %ecx
        jle     .L6
.L10:
        testq   %rsi, (%r8,%rax,8)
        je      .L5
        movq    %r9, %rdx
        addq    (%rdi), %rdx
        addl    $1, (%rdx)
.L5:
        addq    $1, %rax
        cmpl    %eax, %ecx
        jg      .L10
.L6:
        addq    $4, %r9
        subl    $1, %r10d
        cmpq    $16, %r9
        jne     .L4
        ret

Which does not seem at all inefficient to me. 对我来说，这似乎一点也不低效。

Answer 4

There are redundant memory reallocations and some other code in your program. 程序中有多余的内存重新分配和一些其他代码。 For example before using method push_back you could at first reserve enough memory in the vector. 例如，在使用方法push_back之前，您可以首先在向量中保留足够的内存。

The program could look the following way. 该程序可能如下所示。

#include <iostream>
#include <bitset>
#include <vector>

const size_t N = 4;

int main() 
{
    std::vector<std::bitset<N>> bitsets = 
    { 
        std::bitset<N>( "1011" ), 
        std::bitset<N>( "1110" ),
        std::bitset<N>( "0110" )
    };

    std::vector<unsigned int> counts( N );

    for ( const auto &b : bitsets )
    {
        for ( size_t i = 0; i < N; i++ ) counts[i] += b[N - i -1]; 
    }

    for ( unsigned int val : counts ) std::cout << val;
    std::cout << std::endl;

    return 0;
}

Its output is 它的输出是

计算多个std :: bitset中1的出现的最快方法 <N> ？

问题描述

4 个解决方案

解决方案1
1 2016-07-04 13:19:51

解决方案2
0 2016-07-04 12:02:52

解决方案3
0 2016-07-04 12:05:28

解决方案4
0 2016-07-04 12:13:10

计算多个std :: bitset中1的出现的最快方法 <N> ？

问题描述

4 个解决方案

解决方案1 1 2016-07-04 13:19:51

解决方案2 0 2016-07-04 12:02:52

解决方案3 0 2016-07-04 12:05:28

解决方案4 0 2016-07-04 12:13:10

解决方案1
1 2016-07-04 13:19:51

解决方案2
0 2016-07-04 12:02:52

解决方案3
0 2016-07-04 12:05:28

解决方案4
0 2016-07-04 12:13:10