Just out of curiosity: how come linux kernel “optimized” strcpy is much slower the libc imp?

Question

I tried to benchmark optimized string operations under http://lxr.linux.no/#linux+v2.6.38/arch/x86/lib/string_32.c and compare to regular strcpy:

#include<stdio.h>
#include<stdlib.h>
char *_strcpy(char *dest, const char *src)
{
        int d0, d1, d2;
        asm volatile("1:\tlodsb\n\t"
                "stosb\n\t"
                "testb %%al,%%al\n\t"
                "jne 1b"
                : "=&S" (d0), "=&D" (d1), "=&a" (d2)
                : "0" (src), "1" (dest) : "memory");
        return dest;
}
int main(int argc, char **argv){
        int times = 1;
        if(argc >1)
        {
                times = atoi(argv[1]);
        }
        char a[100];
        for(; times; times--)
          _strcpy(a, "Hello _strcpy!");


        return 0;
}

and timeing it using (time .. ) showed that it is about x10 slower than regular strcpy (under x64 linux)

Why?

Answer 1

If your string is constant, it's possible that the compiler is inlining the copy (for the plain strcpy call), making it into a series of unconditional MOV instructions. since this is linear code without conditions, it would be faster than the linux variant.

Just out of curiosity: how come linux kernel “optimized” strcpy is much slower the libc imp?

Question

1 answers

solution1
2 ACCPTED 2011-05-08 13:57:14

Just out of curiosity: how come linux kernel “optimized” strcpy is much slower the libc imp?

Question

1 answers

solution1 2 ACCPTED 2011-05-08 13:57:14

solution1
2 ACCPTED 2011-05-08 13:57:14