简体   繁体   中英

Fastest way to check if a byte array contains only 1 value

Oke now obviously I can check if a byte array contains only 1 value but I do not know if this is the fastest way to do it. The problem is sometimes i will get a byte array with only FF (255) values, if this happens i need to ignore it in the code which comes next, so the thing I've done is as follows:

var onlyFF = true;
foreach(var value in programCode)
{
    if (value != 0xFF)
    {
        onlyFF = false;
        break;
    }
}

But is this the fastest way? I'm going to have to check a tremendous amount of array's (all arrays are quite small though (350) )

So is this the fastest way or are there better ways for doing this?

There certainly are faster ways to optimize the specific check that you are performing. The real question as some of the comments has already indicated is whether it is truly necessary? Whether it is worthwhile to optimize the query really depends on a couple of question you will have to ask yourself first.

  1. What is your performance criteria?

    How many arrays a second should you be able to process? If the answer is 1000 or less, then I would definitely not bother with trying to optimize the code. If the answer is millions of arrays a second, then you might want to consider doing some performance testing of your code.

  2. What type of data do you expect?

    If 99% of the buffers you process are valid (not all 0xFF bytes) then your loop will most likely exist out on the first few checks in most cases. Does it make sense to optimize your algorithm for the worst case scenario if it only applies to 1% of your work load.

  3. What risks do I introduce to your code by changing the comparison method and does the benefits outweigh the risk?

A common optimization technique that Adwaenyth mentioned can be applied to your situation. You can treat your byte array as an array of long and then use the XOR bit logic operator to compare 8 bytes at a time. In order to make use of this method efficiently without the need to copy buffers you would have to make use of unsafe code. The following example shows a quick and dirty implementation of how this can be done (note I have not tested this code so please don't use without proper testing first):

    public static bool IsBufferValidUnSafeXOR(byte[] buffer)
    {
        bool isValid = false;

        int byteLength = buffer.Length;
        int base64Length = byteLength >> 3;  // same as -- > (int)(byteLength / 8);
        int remainingBytes = byteLength - (base64Length << 3);
        ulong toggleMask = 0xFFFFFFFFFFFFFFFF;

        unsafe 
        {
            fixed (byte* pByteBuffer = buffer)
            {
                ulong* pBuffer = (ulong*)pByteBuffer;
                int index = 0;

                while (index < base64Length)
                {
                    if ((pBuffer[index] ^ toggleMask) > 0)
                    {
                        isValid = true;
                        break;
                    }

                    index++;
                }

            }
        }

        // Check remainder of byte array
        if (!isValid)
        {
            int index = (base64Length << 3);

            while(index < byteLength)
            {
                if (buffer[index] != 0xFF)
                {
                    isValid = true;
                    break;
                }

                index++;
            }

        }

        return isValid;
    }

I ran a couple of performance comparison of your current non-optimized method and the optimized method. I execute each method in a loop checking the validity of 1.5 million buffers. For the first test only 5% of the buffers checked was invalid. The second check 33% of the buffers was invalid for the 3rd 50% and the 4th a 100%. The table below shows how the two methods compared:

---------------------------------------------------------------------------------------------
| Nr | Total Nr.        | Valid Buffer  | Invalid Buffer    | Std Method    | XOR Unsafe    |
|    | Buffers Checked  | Count         | Count             | Execution Time| Execution Time|
---------------------------------------------------------------------------------------------
| 1  | 1,500,00         | 1,425,000     | 75,000            | 183 ms        | 124 ms        |
---------------------------------------------------------------------------------------------
| 2  | 1,500,00         | 1,000,000     | 500,000           | 566 ms        | 226 ms        |
---------------------------------------------------------------------------------------------
| 3  | 1,500,00         | 750,000       | 750,000           | 800 ms        | 259 ms        |
---------------------------------------------------------------------------------------------
| 4  | 1,500,00         | 0             | 1,500,000         | 1574 ms       | 431 ms        |
---------------------------------------------------------------------------------------------

From the above table we can see that while the unsafe (XOR) method is faster the speed difference is insignificant if only 5% of the buffers checked was invalid, while the biggest performance improvement is obtained if a 100% of the buffers are invalid. Which brings us back to the original question is it truly worthwhile to optimize the code?

A fairly easy way to make this faster is to obtain a ulong* to the array and compare 8 byte chunks at a time with 0xFFFFFFFFFFFFFFFFUL . You will likely need to handle misalignment at the start and end of the array by comparing byte-wise there.

You can then unroll the loop maybe 4 times to reduce loop overhead to almost nothing. It will be hard (but possible) to do it faster than that.

Another fairly easy option is to write this in C and PInvoke. C compilers have sophisticated ways of making this fast. The .NET JIT does not. Although I'm surprised that neither GCC nor LLVM do any particular tricks here .

Playing with different code patterns LLVM provides the following optimization:

if (array[i + 0] & array[i + 1] & array[i + 2] & array[i + 3] == 0xFF)
 return true;

That saves a lot of instructions and branches.

for me it sounds like a parallelizable problem. if you have hundreds of these arrays, with hundreds bytes inside, I would think about using the GPU

you can use CUDA "only working on Nvidia cards" or OpenCL "working on all cards" to solve this task.

for c# there is a good lib (for OpenCL) called cloo which is easy to use

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM