简体   繁体   English

如何在字节数组中搜索“n位”?

[英]how to search “n bits” in a byte array?

i have a byte array. 我有一个字节数组。 Now i need to know the count of appearances of a bit pattern which length is N. 现在我需要知道长度为N的位模式的出现次数。

For example, my byte array is "00100100 10010010" and the pattern is "001". 例如,我的字节数组是“00100100 10010010”,模式是“001”。 here N=3, and the count is 5. 这里N = 3,计数为5。

Dealing with bits is always my weak side. 处理比特总是我的弱点。

You could always XOR the first N bits and if you get 0 as a result you have a match. 你总是可以对前N位进行异或,如果你得到0,那么你就得到了一个匹配。 Then shift the searched bit "stream" one bit to the left and repeat. 然后将搜索到的位“流”向左移一位并重复。 That is assuming you want to get matches if those sub-patterns overlap. 假设您希望在这些子模式重叠时获得匹配。 Otherwise you should shift by pattern length on match. 否则你应该在匹配时按模式长度移动。

If N may be arbitrary large You can store the bit pattern in a vector 如果N可能是任意大的,您可以将位模式存储在向量中

vector<unsigned char> pattern;

The size of the vector should be 矢量的大小应该是

(N + 7) / 8

Store the pattern shifted to the right. 存储图案向右移动。 By this, I mean, that for example, if N == 19, Your vector should look like: 通过这个,我的意思是,例如,如果N == 19,你的矢量应该是这样的:

|<-    v[0]   ->|<-    v[1]   ->|<-    v[2]   ->|
 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1
|         |<-             pattern             ->|

If You have Your pattern originally shifted to the left, You can use the function I'll present below, to shift the bits to the right. 如果您的模式最初向左移动,您可以使用我将在下面显示的功能,将位移到右侧。

Define a vector of bytes, of the same length as the pattern, to store a part of Your bit stream for comparing it with the pattern. 定义与模式长度相同的字节向量,以存储比特流的一部分,以便将其与模式进行比较。 I'll call it window 我叫它window

vector<unsigned char> window;

If N is not an integer multiple of 8, You will need to mask some leftmost bits in Your window , when comparing it with the pattern. 如果N不是8的整数倍,则在将其与模式进行比较时,需要屏蔽window最左边的位。 You can define the mask this way: 您可以通过以下方式定义蒙版:

unsigned char mask = (1 << (N % 8)) - 1;

Now, assuming the window contains the bits, it should, You could theoretically compare the pattern with the window using vector's operator == like this 现在,假设window包含位,它应该,理论上你可以使用vector's operator ==将模式与window进行比较

window[0] &= mask;
bool isMatch = (window == pattern);

But there are good reasons to be a little bit more sophisticated. 但有充分的理由让我们变得更加复杂。 If N is large and Your byte array, You look for the pattern in, is significantly larger, it's worth it, to process the pattern and build a vector of size N+1: 如果N很大并且你的字节数组,你在寻找模式,是非常大的,值得的是,处理模式并构建一个大小为N + 1的向量:

vector<int> shifts;

This vector will store the information, how many bits to shift the bit stream by, for the next comparison, based on the position, at which there is a mismatch in the current window . 该向量将存储信息,移位比特流的位数,用于下一次比较,基于当前window存在不匹配的位置。

Consider the pattern 0001001100 . 考虑模式0001001100 You should compare the bits with the window from right to left. 您应该从右到左比较这些位与window If there is a missmatch at the first bit, You know it's 1 and the first occurrence of 1 in Your pattern is at the position 2 counting form 0 form the right to the left. 如果在第一位有一个不匹配,你知道它是1并且你的模式中第一次出现1位于从右到左的位置2计数形式0。 So in that case, You know, that it doesn't make sense to make a comparison if the number of new bits shifted form the bit stream into the window is less than 2. Similarly if the mismatch occurs at the third bit (position 2 counting form 0), the window should be moved by 7, because 3 consecutive zeros in your pattern are at the end. 所以在这种情况下,你知道,如果从位流向window移位的新位数小于2,则进行比较没有意义。类似地,如果不匹配发生在第三位(位置2)计数形式0), window应移动7,因为模式中的3个连续零位于最后。 If the mismatch is at the position 4, You can move the window by 8 and so on. 如果不匹配位于位置4,则可以将window移动8,依此类推。 The sifts vector, at an index i will hold number of bits, by which to move the window , if the mismatch occurs at the position i . 所述sifts载体,索引在i将持有的比特数,通过该移动window ,如果发生的位置处的失配i If there is a match, the window should be moved by the number of bits stored in shifts[N] . 如果存在匹配,则window应按shifts[N]存储的位数shifts[N] In the example above, a match means a shift by 8. 在上面的示例中,匹配表示移位8。

In practice of course, You compare whole bytes form the pattern with the bytes from the window (going form right to left) and if there is a mismatch You examine the bits in the byte to find the mismatch position. 当然,在实践中,您将模式中的整个字节与window的字节(从右到左)进行比较,如果存在不匹配,则检查字节中的位以查找不匹配位置。

if(window[i] != pattern[i])
{
    int j = 0;
    unsigned char mismatches = window[i] ^ pattern[i];
    while((mismatches & 1) == 0)
    {
        mismatches >>= 1;
        ++j;
    }
    mismatch_position = 8 * (window.size() - i - 1) + j;
}

Here is a function that might come handy, when You need to shift some bits from Your bit stream into the window . 当你需要将比特流中的一些位移到window时,这个函数可能会派上用场。 I wrote it in C#, but conversion to C++ should be trivial. 我用C#编写它,但转换为C ++应该是微不足道的。 C# makes some casts necessary, that are probably not necessary in C++. C#需要一些强制转换,这在C ++中可能不是必需的。 Use unsigned char instead of byte , vector<unsigned char> & instead of byte [] , size() instead of Length and maybe some more minor tweaks. 使用unsigned char代替bytevector<unsigned char> &而不是byte []size()而不是Length ,也许还有一些小调整。 The function is probably a little more general than needed in Your scenario, as it doesn't use the fact, that consecutive calls retrieve consecutive chunks of Your byte array, which maybe could make it a bit simpler, but I don't think it hurts. 该函数可能比您的场景中需要的更通用,因为它不使用事实,连续调用检索您的字节数组的连续块,这可能会使它更简单,但我不认为它伤害。 In the current form, it can retrieve arbitrary bit substring form the byte array. 在当前形式中,它可以从字节数组中检索任意位子串。

public static void shiftBitsIntoWindow_MSbFirst(byte[] window, byte[] source,
                                                int startBitPosition, int numberOfBits)
{
    int nob = numberOfBits / 8;
    // number of full bytes from the source

    int ntsh = numberOfBits % 8;
    // number of bits, by which to shift the left part of the window,
    // in the case, when numberOfBits is not an integer multiple of 8

    int nfstbb = (8 - startBitPosition % 8);
    // number Of bits from the start to the first byte boundary
    // The value is from the range [1, 8], which comes handy,
    // when checking if the substring of ntsh first bits
    // crosses the byte boundary in the source, by evaluating
    // the expression ntsh <= nfstbb.

    int nfbbte = (startBitPosition + numberOfBits) % 8;
    // number of bits from the last byte boundary to the end

    int sbtci;
    // index of the first byte in the source, from which to start
    // copying nob bytes from the source
    // The way in which the (sbtci) index is calculated depends on,
    // whether nob < window.Length

    if(nob < window.Length)// part of the window will be replaced
    // with bits from the source, but some part will remain in the
    // window, only moved to the beginning and possibly shifted
    {
        sbtci = (startBitPosition + ntsh) / 8;

        //Loop below moves bits form the end of the window to the front
        //making room for new bits that will come form the source

        // In the corner case, when the number by which to shift (ntsh)
        // is zero the expression (window[i + nob + 1] >> (8 - ntsh)) is
        // zero and the loop just moves whole bytes
        for(int i = 0; i < window.Length - nob - 1; ++i)
        {
            window[i] = (byte)((window[i + nob] << ntsh)
                | (window[i + nob + 1] >> (8 - ntsh)));
        }

        // At this point, the left part of the window contains all the
        // bytes that could be constructed solely from the bytes
        // contained in the right part of the window. Next byte in the
        // window may contain bits from up to 3 different bytes. One byte
        // form the right edge of the window and one or two bytes form
        // the source. If the substring of ntsh first bits crosses the
        // byte boundary in the source it's two.

        int si = startBitPosition / 8; // index of the byte in the source
        // where the bit stream starts

        byte byteSecondPart; // Temporary variable to store the bits,
        // that come from the source, to combine them later with the bits
        // form the right edge of the window

        int mask = (1 << ntsh) - 1;
        // the mask of the form 0 0 1 1 1 1 1 1
        //                         |<-  ntsh ->|

        if(ntsh <= nfstbb)// the substring of ntsh first bits
        // doesn't cross the byte boundary in the source
        {
            byteSecondPart = (byte)((source[si] >> (nfstbb - ntsh)) & mask);
        }
        else// the substring of ntsh first bits crosses the byte boundary
        // in the source
        {
            byteSecondPart = (byte)(((source[si] << (ntsh - nfstbb))
                                   | (source[si + 1] >> (8 - ntsh + nfstbb))) & mask);
        }

        // The bits that go into one byte, but come form two sources
        // -the right edge of the window and the source, are combined below
        window[window.Length - nob - 1] = (byte)((window[window.Length - 1] << ntsh)
                                                | byteSecondPart);

        // At this point nob whole bytes in the window need to be filled
        // with remaining bits form the source. It's done by a common loop
        // for both cases (nob < window.Length) and (nob >= window.Length)

    }
    else// !(nob < window.Length) - all bits of the window will be replaced
    // with the bits from the source. In this case, only the appropriate
    // variables are set and the copying is done by the loop common for both
    // cases
    {
        sbtci = (startBitPosition + numberOfBits) / 8 - window.Length;
        nob = window.Length;
    }


    if(nfbbte > 0)// The bit substring coppied into one byte in the
    // window crosses byte boundary in the source, so it has to be
    // combined form the bits, commming form two consecutive bytes
    // in the source
    {
        for(int i = 0; i < nob; ++i)
        {
            window[window.Length - nob + i] = (byte)((source[sbtci + i] << nfbbte)
                | (source[sbtci + 1 + i] >> (8 - nfbbte)));
        }
    }
    else// The bit substring coppied into one byte in the window
    // doesn't cross byte boundary in the source, so whole bytes
    // are simply coppied
    {
        for(int i = 0; i < nob; ++i)
        {
            window[window.Length - nob + i] = source[sbtci + i];
        }
    }
}

Assuming your array fits into an unsigned int: 假设您的数组符合unsigned int:

int main () {
    unsigned int curnum;
    unsigned int num = 0x2492;
    unsigned int pattern = 0x1;
    unsigned int i;
    unsigned int mask = 0;
    unsigned int n = 3;
    unsigned int count = 0;

    for (i = 0; i < n; i++) {
        mask |= 1 << i;
    }

    for (i = 8 * sizeof(num) - n; i >= 0; i--) {
        curnum = (num >> i) & mask;
        if (! (curnum ^ pattern)) {
            count++;
        }
    }
}

Convert your byte array and pattern each to a std::vector<bool> , then call std::search(source.begin(), source.end(), pattern.begin(), pattern.end()); 转换你的字节数组并将每个模式转换为std::vector<bool> ,然后调用std::search(source.begin(), source.end(), pattern.begin(), pattern.end()); . Despite vector<bool> s idiosyncracies, this will work. 尽管vector<bool>的特性,这将是有效的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM