[英]Implementing a strcpy function in C
My task is like this: I need to implement the strcpy
function under the following constraints: 我的任务是这样的:我需要在以下约束下实现
strcpy
函数:
strcpy
, the destination address will be held as follows: char* newDestination = NULL;
strcpy
的函数中,目标地址将保持如下: char* newDestination = NULL;
strcpy
function should be: void myStrcp(void** dst, void* src);
strcpy
函数的原型应为: void myStrcp(void** dst, void* src);
I came out with this solution which uses uint64_t
to copy each iteration eight bytes. 我提出了使用
uint64_t
复制每个迭代八个字节的解决方案。 If so, my questions would be: 如果是这样,我的问题是:
Windows
Vs. Linux
) and / or platform? Windows
vs Linux
)和/或平台上运行程序是否重要? #include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <conio.h>
void strCpy(void **dst, void *src);
int main()
{
char *newLocation = NULL;
strCpy((void **)&newLocation, "stringToBeCopied");
printf("after my strcpy dst has the string: %s \n", newLocation);
free(newLocation);
getch();
return 0;
}
void strCpy(void** dst, void* src)
{
// Allocating memory for the dst string
uint64_t i, length = strlen((char *)src), *locDst =
(uint64_t *) malloc(length + 1), *locSrc = (uint64_t *) src;
*dst = locDst;
// Copy 8 Bytes each iteration
for (i = 0; i < length / 8; *locDst++ = *locSrc++, ++i);
// In case the length of the string is not alligned to 8 Bytes - copy the remainder
// (last iteration)
char *char_dst = (char *)locDst, *char_src = (char *)locSrc;
for (; *char_src != '\0'; *char_dst++ = *char_src++);
// NULL terminator
*char_dst = '\0';
}
Vectorization is indeed the key. 向量化确实是关键。 A better solution along the same thought would be using SSE/AVX for an even more efficient copy.
同样的想法,更好的解决方案是使用SSE / AVX以获得更有效的副本。 This of course makes the program platform specific as you need to detect the max vectorization supported.
当然,这可以根据需要确定特定于程序平台的位置,以检测所支持的最大矢量化。
Several issues you should also address: 您还应该解决几个问题:
alignment of src/dst - if the chunk you copy (in your case above - a 64bit one) exceeds a cache line, then the HW would most likely incur an overhead in doing the copy due to cache line split. src / dst的对齐方式-如果您复制的块(在上面的示例中为64位)超过了缓存行,则由于缓存行拆分,硬件很可能会在执行复制时产生开销。 The overhead would probably become bigger in longer vectors (and is also more frequent there).
在较长的向量中,开销可能会变得更大(并且在那里的频率也更高)。 You could therefore add a few initial checks to address this problem by copying a head in smaller chunks like you handle the tail.
因此,您可以添加一些初始检查来解决此问题,方法是像处理尾巴一样,将头分成较小的块来复制。
Can the src/dst regions collide? src / dst区域会发生冲突吗? if so you need to provide a definition for correct functional behavior (it becomes less trivial in case of copying in chunks).
如果是这样,则需要为正确的功能行为提供定义(在分块复制的情况下,它变得不那么琐碎了)。
Note the difference between strcpy and memcpy (see also here ). 注意strcpy和memcpy之间的区别(另请参见此处 )。 This makes the vectorization much less trivial, so you need to define the requirement here.
这使得矢量化变得不那么琐碎,因此您需要在此处定义要求。 Currently your function might differ from what is expected in a classic strcpy, as you don't check for null bytes within each chunk.
当前,您的功能可能与传统strcpy中的功能有所不同,因为您不检查每个块中的空字节。 Not sure if that's an issue for you.
不确定这是否对您有问题。
Code size limitation is not very performance friendly (well, except when your bottleneck is instruction-cache capacity or branch predictability, but that's pretty advanced). 代码大小限制不是非常友好的性能(嗯,除非瓶颈是指令缓存容量或分支可预测性,但这是相当先进的)。 The 7-statements limitation might mean you're overthinking this :)
这7句话的限制可能意味着您对这个想法太过思索了:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.