简体   繁体   English

右值如何分配给汇编中的左值?

[英]How are rvalues assigned to lvalues in assembly?

First question here.这里的第一个问题。 I will in a few weeks/months need to create procedural code in which there will be functions assigning big (I mean really big) sets of data directly to pointers.我将在几周/几个月内创建程序代码,其中将有函数将大(我的意思是非常大)数据集直接分配给指针。 Here is some example of code I will be doing:这是我将要做的一些代码示例:

void MyFuntion(string* str)
{
     *str = "some data in a string";
}

As it surely is important: I am on windows 10, in visual-studio 2019, compiling with the default c++ compiler on release x86. As it surely is important: I am on windows 10, in visual-studio 2019, compiling with the default c++ compiler on release x86.

Imagine something like this but with strings that can contain several millions of characters, or with int/float arrays also with several millions of elements.想象一下这样的情况,但字符串可以包含数百万个字符,或者 int/float arrays 也具有数百万个元素。 So, this is a single operation assigning a rvalue to a pointer, which is therefore on the heap.因此,这是将右值分配给指针的单个操作,因此该指针位于堆上。 Of course, if I create a local variable containing the data, it will be more than 1MB and therefore will cause a stack overflow, right?当然,如果我创建一个包含数据的局部变量,它会超过 1MB,因此会导致堆栈溢出,对吗?

As I understand, since the data only exists as a rvalue here, it doesn't have a memory existence, but I would like to know: how is the rvalue assigned to the pointer?据我了解,由于数据在这里仅作为右值存在,因此它不存在 memory ,但我想知道:右值是如何分配给指针的? Like, how is it done in assembly?就像,它是如何在组装中完成的? I must say I have never done any assembly, I have a few (very few) notions but I'd like to get into it when I have time.我必须说我从未做过任何组装,我有一些(很少)想法,但我想在有时间的时候进入它。

Is it temporary created in the stack or heap before being put in the final memory address?它是在放入最终的 memory 地址之前在堆栈或堆中临时创建的吗? My guess is that the memory address (the pointer in which I am assigning the data) is directly filled with the data, like, bit by bit, so no existence of the rvalue in memory.我的猜测是 memory 地址(我在其中分配数据的指针)直接填充了数据,例如,逐位填充,因此 memory 中不存在右值。

If I'm correct, the only things that exist in the stack here are: the function call, the pointer copy, then the instruction, which should be something like "assign rvalue X to lvalue Y" and the size of the instruction doesn't depend on the size of the rvalue and lvalue, so there should not be any problem regarding the stack here.如果我是正确的,这里堆栈中唯一存在的东西是:function 调用,指针副本,然后是指令,应该类似于“将右值 X 分配给左值 Y”,并且指令的大小不会t 取决于右值和左值的大小,所以这里的堆栈应该没有任何问题。

So, if I'm correct, this code should not cause any problem, no matter how big the rvalue is, but I would still like to know how it is done exactly, assembly-wise.所以,如果我是正确的,这段代码应该不会引起任何问题,不管右值有多大,但我仍然想知道它是如何准确地完成的,组装方式。 Note that I am not only looking for an answer, but more like some references, books or docs, that could explain in detail.请注意,我不仅在寻找答案,而且更像是一些可以详细解释的参考资料、书籍或文档。 I guess what I am looking for won't be in a c++ book, but more like a assembly book, this might be a good starting point to get myself into it !我想我正在寻找的内容不会出现在 c++ 书中,但更像是一本汇编书,这可能是让自己进入其中的一个很好的起点!

Although a specific OS and compiler were mentioned, the example assembly in this answer will probably differ from what the querent's compiler would output, because I don't have a Windows 10 machine available at the time of writing and used a different environment having forgotten about Godbolt .尽管提到了特定的操作系统和编译器,但此答案中的示例程序集可能与查询者的编译器 output 不同,因为在撰写本文时我没有可用的 Windows 10 机器,并且使用了忘记了的不同环境神箭 However, this topic is general enough in my opinion that it shouldn't really matter in this specific case.但是,我认为这个主题足够笼统,在这种特定情况下并不重要。


What even is a value on the right side of an assignment operator?赋值运算符右侧的值是什么? What does assignment look like at the assembly level?装配级别的分配是什么样的? Here's a simple example.这是一个简单的例子。

void assign_thing(int *p) {
    *p = 42;
}
movl $42, (%rdi)
retq

"Move the 32-bit integer 42 into the memory location to which rdi is pointing." “将 32 位 integer 42移动到rdi指向的 memory 位置。” %rdi here represents p , and (%rdi) means *p . %rdi这里代表p(%rdi)代表*p For something dead simple like an integer, it's pretty much that simple.对于像 integer 这样简单的东西,就这么简单。 How about a simple structure?简单的结构怎么样?

struct stuff {
    int id;
    float value;
    char text[8];
};

void assign_thing(stuff *p) {
    *p = {42, 1.5, "Hello!"};
}
movabsq $4593671619917905962, %rax
movq    %rax, (%rdi)
movabsq $36762444129608, %rax
movq    %rax, 8(%rdi)
retq

A little harder to read at first glance, but pretty much the same idea.乍一看有点难以阅读,但几乎是相同的想法。 The compiler was smart and packed the integer and float values 42 and 1.5 into a single 64-bit value and stuffs that directly into (%rdi) .编译器很聪明,将 integer 和浮点值421.5打包成一个 64 位值,然后直接填充到(%rdi)中。 Likewise with the string "Hello!"字符串"Hello!"也是如此。 , which is short enough to fit into a single 64-bit value and gets stuffed into 8(%rdi) (8 bytes past p is the offset of text ). ,它足够短,可以放入单个 64 位值并填充到8(%rdi)中( p之后的 8 个字节是text的偏移量)。


So far, none of the rvalues actually exist in memory when they get assigned.到目前为止,memory 在分配时实际上不存在任何右值。 They're just part of the instructions.它们只是说明的一部分。 What if it's something a lot bigger, like a string?如果它是更大的东西,比如一根绳子怎么办?

// Overflow checking omitted for brevity.
void assign_thing(char *p) {
    // Assignment with = doesn't actually do what you'd want here,
    // so this'll have to do.
    strcpy(p, "What if it's something a lot bigger, like a string?");
}
vmovups -5484(%rip), %ymm0
vmovups %ymm0, 20(%rdi) ; I'm guessing the disassembler meant to say 0x20
vmovups -5517(%rip), %ymm0
vmovups %ymm0, (%rdi)
vzeroupper
retq

Now, the rvalue does reside in memory when it gets assigned.现在,右值在分配时确实驻留在 memory 中。 Do note that this is not because strcpy was used instead of = , but because the compiler decided that it would be better to store that "rvalue" string somewhere in a read-only area like .rodata and just copy it over.请注意,这不是因为使用strcpy而不是= ,而是因为编译器认为最好将该“右值”字符串存储在.rodata之类的只读区域中的某个位置,然后将其复制过来。 If I had used a much shorter string, any reasonably modern compiler would probably optimize it into a few mov or movabsq instructions like in the second example.如果我使用了一个更短的字符串,那么任何合理的现代编译器都可能会将其优化为一些movmovabsq指令,就像第二个示例中一样。 Unless p points to a buffer on the stack and your strcpy ends up overflowing it, you won't get a stack overflow here.除非p指向堆栈上的缓冲区并且您的strcpy最终溢出它,否则这里不会出现堆栈溢出。


Now what about your example?现在你的例子呢? I'm guessing that your string type is really std::string , and that's not a trivial type.我猜你的string类型真的是std::string ,这不是一个微不足道的类型。 So what happens there?那么那里会发生什么? In C++, the assignment operator = is overloadable, and std::string indeed has its own overloads, so instead of directly stuffing or copying values into the object, a special member function operator= is called.在 C++ 中,赋值运算operator= =是可重载的,而std::string确实有自己的重载,所以不是直接将值填充或复制到 object 中,而是调用了一个特殊的成员 ZC1C425268E68385D1AB5074C17A4。 That is to say, your *str = "some data in a string" is really a str->operator=("some data in a string") .也就是说,你的*str = "some data in a string"实际上是一个str->operator=("some data in a string") How your rvalue string gets copied is up to the implementation of std::string::operator= , but it'll most likely be optimized into something like my last example.您的右值字符串如何被复制取决于std::string::operator=的实现,但它很可能会被优化为类似于我上一个示例的内容。 The actual string data of an std::string resides on the heap, so stack overflow still isn't a problem here. std::string的实际字符串数据驻留在堆上,因此堆栈溢出在这里仍然不是问题。


tl;dr (this answer + the comments, compressed into a few sentences) tl;博士(这个答案+评论,压缩成几句话)

If your string is small enough, it probably won't exist in memory during assignment.如果您的字符串足够小,则在分配期间它可能不会存在于 memory 中。 If it's big enough, it'll sit in a read-only area somewhere and get copied over when needed.如果它足够大,它将位于某处的只读区域中,并在需要时被复制。 The stack is often not even involved, so don't worry about overflow.堆栈通常甚至不涉及,因此不必担心溢出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM