简体   繁体   English

bits twiddling hack:每n位删除一位的最有效方法是什么?

[英]Bits twiddling hack: most efficient way to remove one bit every n bits?

Here is my question: 这是我的问题:

比特杂乱无章

I need to do that very efficiently (I will need to do this operation several billion times on supercomputers) in C or C++11 . 我需要在CC++11中非常有效地完成这项工作(我需要在超级计算机上执行此操作数十亿次)。 N and n are known at compile-time (template parameters). Nn在编译时是已知的(模板参数)。 What is the most efficient algorithm to do that ? 这样做最有效的算法是什么?

Here is an example: 这是一个例子:

#include <iostream>
#include <climits>
#include <type_traits>
#include <bitset>

template <unsigned int Modulo,
          typename Type,
          unsigned int Size = sizeof(Type)*CHAR_BIT,
          class = typename std::enable_if<std::is_integral<Type>::value
                                       && std::is_unsigned<Type>::value>::type>
inline Type f(Type x)
{
    // The most inefficient algorithm ever
    std::bitset<Size> bx(x);
    std::bitset<Size> by(0);
    unsigned int j = 0;
    for (unsigned int i = 0; i < Size; ++i) {
        if (i%Modulo) {
            by[j++] = bx[i];
        }
    }
    return by.to_ullong();
}

int main()
{
    std::bitset<64> x = 823934823;
    std::cout<<x<<std::endl;
    std::cout<<(std::bitset<64>(f<2>(x.to_ullong())))<<std::endl;
    return 0;
}

Semantics first... 语义学首先......

Semantically (and conceptually, because you can't actually use iterators here), you are doing a std::copy_if where your input and output ranges are a std::bitset<N> and your predicate is a lambda of the form (using C++14 generic lambda notation) 在语义上(从概念上讲,因为你实际上不能在这里使用迭代器),你正在做一个std::copy_if ,你的输入和输出范围是std::bitset<N> ,你的谓词是表格的lambda(使用C ++ 14通用lambda表示法)

[](auto elem) { return elem % n != 0; }

This algorithm has O(N) complexity in the number of assignments and number of invocations of your predicate. 该算法在分配数量和谓词调用次数方面具有O(N)复杂度。 Because std::bitset<N> doesn't have iterators, you have to check bit by bit. 因为std::bitset<N>没有迭代器,所以你必须逐位检查。 This means that your loop with a handwritten predicate is doing the exact same computation as a std::copy_if over a hypothetical iterable std::bitset<N> . 这意味着带有手写谓词的循环在假设的可迭代std::bitset<N>执行与std::copy_if完全相同的计算。

This means that as far as asympotic efficiency is concerned, your algorithm should not be considered as inefficient . 这意味着就渐远效率而言,您的算法不应被视为低效

...optimization last ...优化最后

So given the conclusion that your algorithm isn't doing anything as bad as quadratic complexity, can its constant factor be optimized? 因此,如果您的算法没有像二次复杂度那样做任何不好的结论,那么它的常数因子能够被优化吗? The main source of efficiency of a std::bitset comes from the fact that your hardware can handle many (8, 16, 32 or 64) bits in parallel . std::bitset的主要效率来源于您的硬件可以并行处理多个(8,16,32或64)位 If you had access to the implementation, you could write your own copy_if that takes advantage of that parallelism, eg by special hardware instructions, lookup tables, or some bit-twiddling algorithm . 如果您有权访问该实现,您可以编写自己的copy_if ,利用该并行性,例如通过特殊的硬件指令,查找表或一些bit-twiddling算法

Eg this is how the member function count() , as well as the gcc and SGI extensions Find_first_() and Find_next_() are implemented. 例如,这是如何实现成员函数count() ,以及gcc和SGI扩展Find_first_()Find_next_() The old SGI implementation uses lookup tables of 256 entries to handle bit count and quasi-iteration over the bits of each 8-bit char . 旧的SGI实现使用256个条目的查找表来处理每个8位char的位的位计数和准迭代。 The latest gcc version uses __builtin_popcountll() and __builtin_ctzll() to do population count and bit lookup for each 64-bit word. 最新的gcc版本使用__builtin_popcountll()__builtin_ctzll()对每个64位字进行填充计数和位查找。

Unfortunately, std::bitset does not expose its underlying array of unsigned integers. 不幸的是, std::bitset没有暴露其无符号整数的底层数组。 So if you want to improve your posted algorithm, you need to write your own BitSet class template (possible by adapting the source of your own Standard Library) and give it a member function copy_if (or similar) that takes advantage of your hardware. 因此,如果您想改进已发布的算法,则需要编写自己的BitSet类模板(可以通过调整自己的标准库的源代码)并为其提供利用硬件的成员函数copy_if (或类似)。 It can give efficiency gains of a factor of 8 to 64 compared to your current algorithm. 与当前算法相比,它可以提高8到64倍的效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM