简体   繁体   中英

C# - Construct a signal Vector<T> from an integer bitmask

I have some integer value representing a bitmask, for example 154 = 0b10011010 , and I want to construct a corresponding signal Vector<T> instance <0, -1, 0, -1, -1, 0, 0, -1> .

Surely there must be a more efficient way than this?

int mask = 0b10011010;// 154
// -1 = (unsigned) 0xFFFFFFFF is the "true" value
Vector<int> maskVector = new Vector<int>(
    Enumerable.Range(0, Vector<int>.Count)
        .Select(i => (mask & (1 << i)) > 0 ? -1 : 0)
        .ToArray());
// <0, -1, 0, -1, -1, 0, 0, -1>
string maskVectorStr = string.Join("", maskVector);

Note the debugger is bugged displaying Vector<T> values, showing only half of the components and the rest as zeros, hence my use of string.Join .

Furthermore, how I can do this when working with the generic Vector<T> version?

The documentation of ConditionalSelect explicitly says the mask vector has integral values for every overload, but spamming Vector<T>.Zero[0] and Vector<T>.One[0] to get them is surely improper? (And you can get the T version of -1 with (-Vector<T>.One)[0] )


A signal vector, or integral mask vector, is used with the ConditionalSelect method to choose between values of two other masks:

//powers of two <1, 2, 4, 8, 16, 32, 64, 128>
Vector<int> ifTrueVector = new Vector<int>(Enumerable.Range(0, Vector<int>.Count).Select(i => 1 << i).ToArray());
Vector<int> ifFalseVector = Vector<int>.Zero;// or some other vector
// <0, 2, 0, 8, 16, 0, 0, 128>
Vector<int> resultVector = Vector.ConditionalSelect(maskVector, ifTrueVector, ifFalseVector);
string resultStr = string.Join("", resultVector);
// our original mask value back
int sum = Vector.Dot(resultVector, Vector<int>.One);

在此处输入图像描述

PS would there also be a corresponding solution to populating with powers of 2?

There might be a fancy vector-based way to generate your mask vector, but simply optimizing your current code can speed things up by over an order of magnitude.

Firstly, don't use Linq on hot paths. The number of intermediate object allocations, virtual method calls and delegate invocations going on there is simply unnecessary if you're looking for speed. You can rewrite this as a for loop with no loss of clarity.

Secondly, get rid of that array allocation. Vector<T> has constructors which take a Span<T> , and you can stackalloc one of those.

That gives you some code which looks a bit like this:

int mask = 0b10011010;

Span<int> values = stackalloc int[Vector<int>.Count];
for (int i = 0; i < Vector<int>.Count; i++)
{
    values[i] = (mask & (1 << i)) > 0 ? -1 : 0;
}

var maskVector = new Vector<int>(values);

Interestingly, manually unrolling that loop gives you another significant speed-up:

Span<int> values = stackalloc int[Vector<int>.Count];
values[0] = (mask & 0x1) > 0 ? -1 : 0;
values[1] = (mask & 0x2) > 0 ? -1 : 0;
values[2] = (mask & 0x4) > 0 ? -1 : 0;
values[3] = (mask & 0x8) > 0 ? -1 : 0;
values[4] = (mask & 0x10) > 0 ? -1 : 0;
values[5] = (mask & 0x20) > 0 ? -1 : 0;
values[6] = (mask & 0x40) > 0 ? -1 : 0;
values[7] = (mask & 0x80) > 0 ? -1 : 0;

var maskVector = new Vector<int>(values);

How does this perform? Let's use BenchmarkDotNet :

[MemoryDiagnoser]
public class MyBenchmark
{
    [Benchmark, Arguments(0b10011010)]
    public Vector<int> Naive(int mask)
    {
        Vector<int> maskVector = new Vector<int>(
            Enumerable.Range(0, Vector<int>.Count)
                .Select(i => (mask & (1 << i)) > 0 ? -1 : 0)
                .ToArray());

        return maskVector;
    }

    [Benchmark, Arguments(0b10011010)]
    public Vector<int> Optimised(int mask)
    {
        Span<int> values = stackalloc int[Vector<int>.Count];
        for (int i = 0; i < Vector<int>.Count; i++)
        {
            values[i] = (mask & (1 << i)) > 0 ? -1 : 0;
        }

        var output = new Vector<int>(values);
        return output;
    }

    [Benchmark, Arguments(0b10011010)]
    public Vector<int> Optimised2(int mask)
    {
        Span<int> values = stackalloc int[Vector<int>.Count];
        values[0] = (mask & 0x1) > 0 ? -1 : 0;
        values[1] = (mask & 0x2) > 0 ? -1 : 0;
        values[2] = (mask & 0x4) > 0 ? -1 : 0;
        values[3] = (mask & 0x8) > 0 ? -1 : 0;
        values[4] = (mask & 0x10) > 0 ? -1 : 0;
        values[5] = (mask & 0x20) > 0 ? -1 : 0;
        values[6] = (mask & 0x40) > 0 ? -1 : 0;
        values[7] = (mask & 0x80) > 0 ? -1 : 0;

        var output = new Vector<int>(values);
        return output;
    }
}

public class Program
{
    public static void Main()
    {
        var summary = BenchmarkRunner.Run<MyBenchmark>();
    }
}

This gives the results:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1415 (21H2)
Intel Core i7-8565U CPU 1.80GHz (Whiskey Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.101
  [Host]     : .NET 5.0.0 (5.0.20.51904), X64 RyuJIT
  DefaultJob : .NET 5.0.1 (5.0.120.57516), X64 RyuJIT
Method mask Mean Error StdDev Gen 0 Allocated
Naive 154 103.018 ns 2.0509 ns 4.0001 ns 0.0554 232 B
Optimised 154 13.405 ns 0.3004 ns 0.4497 ns - -
Optimised2 154 9.668 ns 0.2827 ns 0.8245 ns - -

It is possible to do it (for a particular T ) with the vector operations offered by the Vector API, for example like this:

Vector<int> shiftMul;

Vector<int> MaskToElements(int mask)
{
    Vector<int> broadcasted = new Vector<int>(mask);
    Vector<int> shifted = Vector.Multiply(broadcasted, shiftMul);
    return Vector.LessThan(shifted, Vector<int>.Zero);
}

Where shiftMul is created like this:

int[] shiftMultipliers = new int[Vector<int>.Count];
for (int i = 0; i < shiftMultipliers.Length; i++)
{
    shiftMultipliers[i] = 1 << (31 - i);
}
shiftMul = new Vector<int>(shiftMultipliers);

This will work for different values of Vector<int>.Count .

This is directly suitable for calls to ConditionalSelect for Vector<int> and Vector<float> (there are special overloads for vectors of float/double for which the mask vector is a vector of int/long respectively), for Vector<uint> the mask vector can simply be reinterpreted. The approach could be adapted to 16 bit types.

8 bit types and 64 bit types are a different matter. AVX2 does not include a proper 8 bit multiplication, nor a proper 64 bit (integer) multiplication. The Vector API does not forbid them, but using them results in a call to a slow fallback implementation, which is best avoided. Probably it would be best to use one of the "nice sizes" (32bit or 16bit) and then either narrow or widen the resulting mask vector to the desired type.

That means the approach has to vary depending on the T , so a good way to do it that is independent of T is not likely, but you could use some version of switch on T to select the appropriate implementation for the given T (inspect the resulting asm, some of them are optimized pretty well by the JIT compiler but maybe not all).


If the mask is a constant, then with the System.Runtime.Intrinsics.X86 API it can go directly into a Blend , it doesn't need to be turned into a vector mask first, for example:

Vector128.AsInt32(Sse41.Blend(Vector128.AsSingle(a), Vector128.AsSingle(b), (byte)mask));

If the mask is not a constant, that API will still accept it, but it ends up calling a slow fallback. In that case, it's better to make a vector mask and use BlendVariable .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM