简体   繁体   中英

Don't fully understand custom-written 'memcpy' function in C

So I was browsing the Quake engine source code earlier today and stumbled upon some written utility functions. One of them was 'Q_memcpy':

void Q_memcpy (void *dest, void *src, int count)
{
    int             i;

    if (( ( (long)dest | (long)src | count) & 3) == 0 )
    {
        count>>=2;
        for (i=0 ; i<count ; i++)
            ((int *)dest)[i] = ((int *)src)[i];
    }
    else
        for (i=0 ; i<count ; i++)
            ((byte *)dest)[i] = ((byte *)src)[i];
}

I understand the whole premise of the function but I don't quite understand the reason for the bitwise OR between the source and destination address. So the sum of my questions are as follows:

  • Why does 'count' get used in the same bitwise arithmetic?
  • Why is that result's last two bits checked if they are differing?
  • What purpose does this whole check serve?

I'm sure it's something obvious but please excuse my ignorance because I haven't really delved into the more low level side of things when it comes to programming. I just find it interesting and want to learn more.

It is finding out whether the source and destination pointers are int aligned, and whether the count is an exact int size of bytes.

If those three things are all true, the ls 2 bits of them all will be 0 (assuming pointers and int are 4 bytes). So the algorithm ORs the three values, and isolates the ls 2 bits.

In this case, it copies int by int . Otherwise it copies char by char .

If the test fails, a more sophisticated algorithm would copy some of the leading and trailing bytes char by char and the intermediate bytes int by int .

The bitwise ORing and ANding with 3 is to check whether the source, destination and count are divisible by 4. If they are, the operation can work with 4-byte words, while this code is assuming int as 4 bytes. Otherwise the operation is performed bytewise.

It first tests if all 3 arguments are divisible by 4. If - and only if - they all are, it proceeds with copying 4 bytes at a time.

Ie this undecoded would be

if ((long) src % 4 == 0 && (long) dst % 4 == 0 && count % 4 == 0 )
{
    count = count / 4;
    for (i = 0; i < count; i++)
        ((int *)dest)[i] = ((int *)src)[i];
}

I am not sure if they tested their compiler and it generated bad code for even a test, and therefore they decided to write it in such a convoluted way. In any case, the x | y | z x | y | z x | y | z will guarantee that a bit n is set in the result if it is set in any of x , y or z . Therefore if the (x | y | z) & 3 results in 0, none of the numbers had either of the 2 lowest bits set, and therefore are divisible by 4.


Of course it would be rather silly to use now - the standard library memcpy in recent library implementations is almost certainly better than this.

Therefore, on recent compilers you can optimize all calls to Q_memcpy by switching them to memcpy . GCC could generate things like 64-bit or SIMD moves with memcpy depending on the size of area to be copied.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM