简体   繁体   中英

Why do UInt16 arrays seem to add faster than int arrays?

It seems that C# is faster at adding two arrays of UInt16[] than it is at adding two arrays of int[] . This makes no sense to me, since I would have assumed the arrays would be word-aligned, and thus int[] would require less work from the CPU, no?

I ran the test-code below, and got the following results:

Int    for 1000 took 9896625613 tick (4227 msec)
UInt16 for 1000 took 6297688551 tick (2689 msec)

The test code does the following:

  1. Creates two arrays named a and b , once.
  2. Fills them with random data, once.
  3. Starts a stopwatch.
  4. Adds a and b , item-by-item. This is done 1000 times.
  5. Stops the stopwatch.
  6. Reports how long it took.

This is done for int[] a, b and for UInt16 a,b . And every time I run the code, the tests for the UInt16 arrays take 30%-50% less time than the int arrays. Can you explain this to me?

Here's the code, if you want to try if for yourself:

public static UInt16[] GenerateRandomDataUInt16(int length)
{
    UInt16[] noise = new UInt16[length];
    Random random = new Random((int)DateTime.Now.Ticks);
    for (int i = 0; i < length; ++i)
    {
        noise[i] = (UInt16)random.Next();
    }

    return noise;
}

public static int[] GenerateRandomDataInt(int length)
{
    int[] noise = new int[length];
    Random random = new Random((int)DateTime.Now.Ticks);
    for (int i = 0; i < length; ++i)
    {
        noise[i] = (int)random.Next();
    }

    return noise;
}

public static int[] AddInt(int[] a, int[] b)
{
    int len = a.Length;
    int[] result = new int[len];
    for (int i = 0; i < len; ++i)
    {
        result[i] = (int)(a[i] + b[i]);
    }
    return result;
}

public static UInt16[] AddUInt16(UInt16[] a, UInt16[] b)
{
    int len = a.Length;
    UInt16[] result = new UInt16[len];
    for (int i = 0; i < len; ++i)
    {
        result[i] = (ushort)(a[i] + b[i]);
    }
    return result;
}


public static void Main()
{
    int count = 1000;
    int len = 128 * 6000;

    int[] aInt = GenerateRandomDataInt(len);
    int[] bInt = GenerateRandomDataInt(len);

    Stopwatch s = new Stopwatch();
    s.Start();
    for (int i=0; i<count; ++i) 
    {
        int[] resultInt = AddInt(aInt, bInt);
    }
    s.Stop();
    Console.WriteLine("Int    for " + count 
                + " took " + s.ElapsedTicks + " tick (" 
                + s.ElapsedMilliseconds + " msec)");

    UInt16[] aUInt16 = GenerateRandomDataUInt16(len);
    UInt16[] bUInt16 = GenerateRandomDataUInt16(len);

    s = new Stopwatch();
    s.Start();
    for (int i=0; i<count; ++i) 
    {
        UInt16[] resultUInt16 = AddUInt16(aUInt16, bUInt16);
    }
    s.Stop();
    Console.WriteLine("UInt16 for " + count 
                + " took " + s.ElapsedTicks + " tick (" 
                + s.ElapsedMilliseconds + " msec)");


}

What happens is that you see a leaky abstraction. UInt16 takes half of memory that int does (16 vs. 32 bit).

This means that the memory area occupied by int16 array takes half of area that the int32 does. So more of that area can fit in processor cache and thus be accessed very quickly.

You could try that code on a processor that has more cache and the difference is likely to be smaller.

Also try with much larger arrays.

数组是字对齐的,但没有理由为什么数组中的条目应该是字对齐的。

Just a SWAG: the smaller memory use of the UInt16 arrays has improved memory characteristics (GC, cache, who knows what else). Since there doesn't seem to be too many allocations, I'd guess that cache is the main factor.

Also, you should take care that benchmarking can be a tricky business - it looks like your times are probably including some of the JIT compilation, which might be skewing results. You might try reversing the order that you test the int array with the UInt16 array and see if the timings follow along or not.

Jon Skeet has (or had) a simple benchmark framework he coded up way back when that tried to take these effects into account. I don't know if it's still available (or even applicable); maybe he'll comment.

Couple of factors

1) You are also timing the generation of the resultant array..so it would be interesting to see how much time it took to just to the adding vs creating the result array that gets passed back

2) It would be interesting to see what IL is generated. Since your code is VERY straightforward (iterate and add), the compiler might be optimizing this, maybe stuffing multiple uint16's in a larger register and doing multiple additions per instruction

I am not expert in .NET but I would check two things :

  1. Passing larger array (N elements of type int ) takes more time then array of N ushort elements. This can tested using various size of arrays and style of coding -- see my comment to question). Numbers from your tests fit this theory :).
  2. Adding two ushort variables can be implemented as adding two int with result of type int -- without checking overflowing . And I assume that handling in code any kind of exception (including overflow exception) is time consuming task. This can be checked in .NET documentation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM