为什么我的 C 代码调用 memmove（而不是 memcpy）

Question

I'm using gcc 12.2 on linux.我在 Linux 上使用 gcc 12.2。 I use -nostdlib and the compiler complained about lack of memcpy and memmove.我使用-nostdlib ，编译器抱怨缺少 memcpy 和 memmove。 So I implemented a bad memcpy in assembly and I had memmove call abort since I always want to use memcpy.所以我在汇编中实现了一个错误的 memcpy，并且我有 memmove 调用中止，因为我一直想使用 memcpy。

I was wondering if I could avoid the compiler asking for memcpy (and memmove) if I implemented my own in C. The optimizer seems to notice what it really is and called the C function anyway.我想知道如果我在 C 中实现自己的函数，是否可以避免编译器要求 memcpy（和 memmove）。优化器似乎注意到它的真实含义并调用 C 函数。 However since it was implemented (with me using #define memcpy mymemcpy ) and since I ran it, I saw my app abort.然而，自从它被实施（我使用#define memcpy mymemcpy ）并且因为我运行它，我看到我的应用程序中止。 It called my memmove implementation instead of assembly memcpy.它调用了我的 memmove 实现而不是程序集 memcpy。 Why is gcc calling move instead of copy?为什么 gcc 调用移动而不是复制？

clang calls memcpy but gcc optimizes my code better so I use it for optimized builds clang 调用 memcpy 但 gcc 更好地优化了我的代码，所以我用它来优化构建

__attribute__ ((access(write_only, 1))) __attribute__((nonnull(1, 2)))
inline void mymemcpy(void *__restrict__ dest, const void *__restrict__ src, int size)
{
    const unsigned char *s = (const unsigned char*)src;
    unsigned char *d = (unsigned char*)dest;
    while(size--) *d++ = *s++;
}

Reproducible可重现

//dummy.cpp

extern "C" {
void*malloc() { return 0; }
int read() { return 0; }
int write() { return 0; }
int memcpy() { return 0; }
int memmove() { return 0; }
}

//main.cpp
#include <unistd.h>
#include <cstdlib>
struct MyVector {
    void*p;
    long long position, length;
};

__attribute__ ((access(write_only, 1))) __attribute__((nonnull(1, 2)))
void mymemcpy(void *__restrict__ dest, const void *__restrict__ src, int size)
{
    const unsigned char *s = (const unsigned char*)src;
    unsigned char *d = (unsigned char*)dest;
    while(size--) *d++ = *s++;
}

//__attribute__ ((noinline))
int func(const char*file_from_disk, MyVector*v)
{
    if (v->position + 5 <= v->length ) {
        mymemcpy(v->p, file_from_disk, 5);
    }
    return 0;
}

char buf[4096];
extern "C"
int _start() {
    MyVector v{malloc(1024),0,1024};
    v.position += read(0, v.p, 1024-5);
    int len = read(0, buf, 4096);
    func(buf, &v);
    write(1, v.p, v.position);
}

g++ -march=native -nostdlib -static -fno-exceptions -fno-rtti -O2 main.cpp dummy.cpp g++ -march=native -nostdlib -static -fno-exceptions -fno-rtti -O2 main.cpp 虚拟.cpp

Check using objdump -D a.out | grep call使用objdump -D a.out | grep call检查objdump -D a.out | grep call

401040: e8 db 00 00 00          call   401120 <memmove>
40108d: e8 4e 00 00 00          call   4010e0 <malloc>
4010a3: e8 48 00 00 00          call   4010f0 <read>
4010ba: e8 31 00 00 00          call   4010f0 <read>
4010c5: e8 56 ff ff ff          call   401020 <_Z4funcPKcP8MyVector>
4010d5: e8 26 00 00 00          call   401100 <write>
402023: ff 11                   call   *(%rcx)

Answer 1

An exact answer requires diving into the code transformations that GCC performs and looking at how your code is transformed by GCC.一个确切的答案需要深入研究 GCC 执行的代码转换并查看 GCC 如何转换您的代码。 That's beyond what I can do in a reasonable amount of time, but I can show you what's going on in more general terms, without diving into GCC internals.这超出了我在合理时间内所能做的，但我可以用更一般的术语向您展示发生了什么，而无需深入研究 GCC 内部结构。

Here's the crazy part: If you remove inline , you will get memcpy .这是疯狂的部分：如果您删除inline ，您将获得memcpy 。 With inline , you get memmove .使用inline ，你得到memmove 。 I'll show the results on Godbolt and then talk about how compilers work to explain it.我将在 Godbolt 上展示结果，然后讨论编译器如何工作来解释它。

The Code代码

Here's some test code I put on Godbolt .这是我放在Godbolt上的一些测试代码。

__attribute__ ((access(write_only, 1))) __attribute__((nonnull(1, 2)))
extern inline void mymemcpy(void *__restrict__ dest, const void *__restrict__ src, int size)
{
    const unsigned char *s = (const unsigned char*)src;
    unsigned char *d = (unsigned char*)dest;
    while(size--) *d++ = *s++;
}

void test(void *dest, const void *src, int size)
{
    mymemcpy(dest, src, size);
}

Here's the resulting assembly这是生成的程序集

mymemcpy:
        test    edx, edx
        je      .L1
        mov     edx, edx
        jmp     memcpy
.L1:
        ret
test:
        test    edx, edx
        je      .L4
        mov     edx, edx
        jmp     memmove
.L4:
        ret

Yes, you can see that one function is getting converted to memcpy or memmove .是的，您可以看到一个函数正在转换为memcpy或memmove 。 It's not just the same code, it's just one function, which is getting transformed differently depending on whether or not it is inlined.它不仅仅是相同的代码，它只是一个函数，它根据是否内联而进行不同的转换。 Why?为什么？

How Optimization Passes Work优化过程如何工作

You might think of a C compiler as doing something like this:您可能会认为 C 编译器会执行如下操作：

Preprocess + tokenize source files,预处理+标记化源文件，
Parse to create AST,解析创建 AST，
Type check,类型检查，
Optimize,优化,
Emit code.发出代码。

In reality, that "optimization" item is many different passes through the code, and each of those passes modify the code in different ways.实际上，“优化”项是通过代码的许多不同遍历，并且这些遍中的每一个都以不同的方式修改代码。 These passes happen at different times during compilation, and some optimization passes may happen multiple times.这些遍在编译期间的不同时间发生，并且一些优化遍可能发生多次。

The order in which specific optimization passes occur affects the results.特定优化过程发生的顺序会影响结果。 If you perform optimization X and then optimization Y, you get a different result from doing Y and then X. Maybe one transformation propagates information from one part of the program to another, and then a different transformation acts on that information.如果您先执行优化 X，然后优化 Y，您会得到与先执行 Y，然后再执行 X 不同的结果。可能一个转换将信息从程序的一部分传播到另一部分，然后另一个不同的转换作用于该信息。

Why is this relevant here?为什么这在这里相关？

You can see here that there's a restrict pointer src and dest .你可以在这里看到有一个restrict指针src和dest 。 Since these pointers are restrict , GCC "should" be able to know that memcpy is acceptable, and memmove is not necessary.由于这些指针是restrict ，GCC“应该”能够知道memcpy是可以接受的，而memmove不是必需的。

However, that means that the information that src and dest are restrict pointers must be propagated to the loop which is ultimately transformed into memmove or memcpy , and that information must be propagated before the transformation takes place.但是，这意味着必须将src和dest是restrict指针的信息传播到最终转换为memmove或memcpy的循环，并且必须在转换发生之前传播该信息。 You could easily first transform the loop into memmove and then, later, figure out that the arguments are restrict , but it's too late!您可以先轻松地将循环转换为memmove ，然后再确定参数是restrict ，但为时已晚！

It looks like, somehow, the information that src and dest are restrict is getting lost when the function is inlined.看起来，不知何故，当函数被内联时， src和dest被restrict的信息正在丢失。 This gives us a couple different theories for why this might happen:这为我们提供了几种不同的理论来解释为什么会发生这种情况：

Maybe the propagation of restrict is somehow broken after inlining, due to a bug.可能由于错误，内联后restrict的传播以某种方式被破坏。
Maybe GCC infers restrict from the calling function after inlining, under the assumption that the calling function has more context than the function being inlined.假设调用函数比被内联的函数具有更多的上下文，GCC 可能会在内联后从调用函数中推断出restrict 。
Maybe the optimization passes don't happen in the right order here for the restrict to propagate to the loop.也许优化传递没有以正确的顺序发生， restrict传播到循环。 Maybe that information propagates, and then inlining is performed afterwards, and then the loop optimization happens after that.也许该信息传播，然后内联执行，然后循环优化发生在这之后。

Optimization passes (code transformation passes) are sensitive to reordering, after all.毕竟，优化通道（代码转换通道）对重新排序很敏感。 This is an extremely complicated area of compiler design.这是编译器设计的一个极其复杂的领域。

Disabling The Optimization禁用优化

Use -fno-tree-loop-distribute-patterns , or use a pragma:使用-fno-tree-loop-distribute-patterns ，或使用 pragma：

#pragma GCC optimize ("no-tree-loop-distribute-patterns")

Answer 2

simple use -fno-builtin command line option.简单使用-fno-builtin命令行选项。

https://godbolt.org/z/3Ys1s9jPr https://godbolt.org/z/3Ys1s9jPr

为什么我的 C 代码调用 memmove（而不是 memcpy）

问题描述

2 个解决方案

解决方案1
5 2022-12-14 01:00:19

The Code代码

How Optimization Passes Work优化过程如何工作

Disabling The Optimization禁用优化

解决方案2
0 2022-12-14 01:21:58

为什么我的 C 代码调用 memmove（而不是 memcpy）

问题描述

2 个解决方案

解决方案1 5 2022-12-14 01:00:19

The Code代码

How Optimization Passes Work优化过程如何工作

Disabling The Optimization禁用优化

解决方案2 0 2022-12-14 01:21:58

解决方案1
5 2022-12-14 01:00:19

解决方案2
0 2022-12-14 01:21:58