在C中实现strcpy函数

Question

My task is like this: I need to implement the strcpy function under the following constraints: 我的任务是这样的：我需要在以下约束下实现strcpy函数：

The function can have no more than seven statements. 该函数最多可以包含七个语句。
It should be as fast as possible. 它应该尽可能快。
It should use the minimum amount of memory it can. 它应该使用可能的最小内存量。
In the function that will call my strcpy , the destination address will be held as follows: char* newDestination = NULL; 在将调用我的strcpy的函数中，目标地址将保持如下： char* newDestination = NULL;
The prototype of the strcpy function should be: void myStrcp(void** dst, void* src); strcpy函数的原型应为： void myStrcp(void** dst, void* src);

I came out with this solution which uses uint64_t to copy each iteration eight bytes. 我提出了使用uint64_t复制每个迭代八个字节的解决方案。 If so, my questions would be: 如果是这样，我的问题是：

Is there a better solution than mine - and if so please explain why it is better? 有比我的解决方案更好的解决方案吗？如果可以，请解释为什么它更好？
Does it matter on which OS we are running the program ( Windows Vs. Linux ) and / or platform? 我们在哪个操作系统（ Windows vs Linux ）和/或平台上运行程序是否重要？

My solution (On Windows): 我的解决方案（在Windows上）：

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <conio.h>

void strCpy(void **dst, void *src);

int main()
{
    char *newLocation = NULL;

    strCpy((void **)&newLocation, "stringToBeCopied");
    printf("after my strcpy dst has the string: %s \n", newLocation);
    free(newLocation);
    getch();
    return 0;
}

void strCpy(void** dst, void* src)
{
    // Allocating memory for the dst string
    uint64_t i, length = strlen((char *)src), *locDst =
        (uint64_t *) malloc(length + 1), *locSrc = (uint64_t *) src;
    *dst = locDst;

    // Copy 8 Bytes each iteration
    for (i = 0; i < length / 8; *locDst++ = *locSrc++, ++i);

    // In case the length of the string is not alligned to 8 Bytes - copy the remainder
    // (last iteration)
    char *char_dst = (char *)locDst, *char_src = (char *)locSrc;

    for (; *char_src != '\0'; *char_dst++ = *char_src++);

    // NULL terminator
    *char_dst = '\0';
}

Answer 1

Vectorization is indeed the key. 向量化确实是关键。 A better solution along the same thought would be using SSE/AVX for an even more efficient copy. 同样的想法，更好的解决方案是使用SSE / AVX以获得更有效的副本。 This of course makes the program platform specific as you need to detect the max vectorization supported. 当然，这可以根据需要确定特定于程序平台的位置，以检测所支持的最大矢量化。

Several issues you should also address: 您还应该解决几个问题：

alignment of src/dst - if the chunk you copy (in your case above - a 64bit one) exceeds a cache line, then the HW would most likely incur an overhead in doing the copy due to cache line split. src / dst的对齐方式-如果您复制的块（在上面的示例中为64位）超过了缓存行，则由于缓存行拆分，硬件很可能会在执行复制时产生开销。 The overhead would probably become bigger in longer vectors (and is also more frequent there). 在较长的向量中，开销可能会变得更大（并且在那里的频率也更高）。 You could therefore add a few initial checks to address this problem by copying a head in smaller chunks like you handle the tail. 因此，您可以添加一些初始检查来解决此问题，方法是像处理尾巴一样，将头分成较小的块来复制。
Can the src/dst regions collide? src / dst区域会发生冲突吗？ if so you need to provide a definition for correct functional behavior (it becomes less trivial in case of copying in chunks). 如果是这样，则需要为正确的功能行为提供定义（在分块复制的情况下，它变得不那么琐碎了）。
Note the difference between strcpy and memcpy (see also here ). 注意strcpy和memcpy之间的区别（另请参见此处）。 This makes the vectorization much less trivial, so you need to define the requirement here. 这使得矢量化变得不那么琐碎，因此您需要在此处定义要求。 Currently your function might differ from what is expected in a classic strcpy, as you don't check for null bytes within each chunk. 当前，您的功能可能与传统strcpy中的功能有所不同，因为您不检查每个块中的空字节。 Not sure if that's an issue for you. 不确定这是否对您有问题。
Code size limitation is not very performance friendly (well, except when your bottleneck is instruction-cache capacity or branch predictability, but that's pretty advanced). 代码大小限制不是非常友好的性能（嗯，除非瓶颈是指令缓存容量或分支可预测性，但这是相当先进的）。 The 7-statements limitation might mean you're overthinking this :) 这7句话的限制可能意味着您对这个想法太过思索了:)

在C中实现strcpy函数

问题描述

My solution (On Windows): 我的解决方案（在Windows上）：

1 个解决方案

解决方案1
1 已采纳 2013-10-10 20:45:03

在C中实现strcpy函数

问题描述

My solution (On Windows): 我的解决方案（在Windows上）：

1 个解决方案

解决方案1 1 已采纳 2013-10-10 20:45:03

解决方案1
1 已采纳 2013-10-10 20:45:03