简体   繁体   English

无法理解寄存器和变量之间的汇编mov指令

[英]Can't understand assembly mov instruction between register and a variable

I am using NASM assembler on linux 64 bit. 我在Linux 64位上使用NASM汇编程序。 There is something with variables and registers I can't understand. 我无法理解某些带有变量和寄存器的东西。 I create a variable named "msg": 我创建一个名为“ msg”的变量:

 msg db "hello, world"  

Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously. 现在,当我要写入标准输出时,将msg移至rsi寄存器,但是我不按位理解mov指令... rsi寄存器由64位组成,而msg变量具有12个符号,每个符号8位,这意味着msg变量的大小为12 * 8位,明显大于64位。

So how is this even possible to make an instruction like: 因此,如何制作这样的指令呢?
mov rsi, msg , without overflowing the memory allocated for rsi. mov rsi, msg ,而不会溢出分配给rsi的内存。

Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol? 还是rsi寄存器包含字符串的第一个符号的存储位置,并且在写入1个符号后将其更改为下一个符号的存储位置?

Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while. 抱歉,如果我写的是完全废话,我是组装的新手,一会儿我就无法掌握它。

In NASM syntax (unlike MASM syntax) mov rsi, symbol puts the address of the symbol into RSI. 在NASM语法(与MASM语法不同)中, mov rsi, symbolmov rsi, symbol地址放入RSI。

mov rsi, [symbol] would load 8 bytes starting at symbol . mov rsi, [symbol]将加载从symbol开始的8个字节。 It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that. 编写这样的指令时,您可以选择一个有用的位置来加载8个字节。

mov   rsi,  msg           ; rsi  = address of msg
movzx eax, byte [rsi+1]   ; rax  = 'e' (upper 7 bytes zeroed)
mov   edx, [msg+6]        ; rdx  = ' wor' (upper 4 bytes zeroed)

Note that you can use mov esi, msg because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). 请注意,可以使用mov esi, msg因为符号地址始终适合32位(在默认的“小”代码模型中,所有静态代码/数据都位于2GB的虚拟地址空间中)。 NASM makes this optimization for you with assemble-time constants (like mov rax, 1 ), but probably it can't with link-time constants. NASM使用汇编时间常量(例如mov rax, 1 )为您进行了优化,但可能无法使用链接时间常量。 Why do most x64 instructions zero the upper part of a 32 bit register 为什么大多数x64指令将32位寄存器的高位归零

and after writing 1 symbol it changes to the memory location of the next symbol? 在写入1个符号后,它会更改为下一个符号的存储位置吗?

No, if you want that you have to inc rsi . 不,如果您需要,必须增加inc rsi There is no magic. 没有魔术。 Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory. 指针就是您可以像其他整数一样操作的整数,而字符串只是内存中的字节。

Accessing registers doesn't magically modify them. 访问寄存器不会神奇地修改它们。

There are instructions like lodsb and pop that load from memory and increment a pointer ( rsi or rsp respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov even if you want it. 有诸如lodsbpop类的指令从内存加载并递增指针(分别为rsirsp ),但是x86没有任何前/后递增/递减寻址模式,因此即使使用mov也无法获得该行为如果你想要的话。 Use add / sub or inc / dec . 使用add / subinc / dec

Disclaimer: I'm not familiar with the flavor of assembly that you're dealing with, so the following is more general. 免责声明:我不熟悉您要处理的程序集的样式,因此以下内容更为笼统。 The particular flavor may have more features than what I'm used to. 特定的口味可能比我以前习惯的功能更多。 In general, assembly deals with single byte/word entities where the size depends on the processor. 通常,程序集处理大小取决于处理器的单字节/字实体。 I've done quite a bit of work on 8 and 16-bit processors, so that is where my answer is coming from. 我已经在8位和16位处理器上做了大量工作,所以这就是我的答案所在。

General statements about Assembly: Assembly is just like a high level language, except you have to handle a lot more of the details. 关于汇编的一般说明:汇编就像是一种高级语言,只是您必须处理更多细节。 So if you're used to some operation in say C, you can start there and then break the operation down even further. 因此,如果您习惯使用C语言进行某些操作,则可以从那里开始,然后进一步细分该操作。

For instance, if you have declared two variables that you want to add, that's pretty easy in C: 例如,如果您声明了两个要添加的变量,那么在C语言中这很简单:

x = a + b;

In assembly, you have to break that down further: 在组装中,您必须进一步分解:

mov R1, a  * get value from a into register R1
mov R2, b  * get value from b into register R2
add R1,R2  * perform the addition (typically goes into a particular location I'll call it the accumulator
mov x, acc * store the result of the addition from the accumulator into x

Depending on the flavor of assembly and the processor, you may be able to directly refer to variables in the addition instruction, but like I said I would have to look at the specific flavor you're working with. 根据组装和处理器的风格,您也许可以直接在加法指令中引用变量,但是就像我说的那样,我必须查看所使用的特定风格。

Comments on your specific question: If you have a string of characters, then you would have to move each one individually using a loop of some sort. 对您的特定问题的评论:如果您有一串字符,则必须使用某种循环分别移动每个字符。 I would set up a register to contain the starting address of your string, and then increment that register after each character is moved. 我将设置一个寄存器来包含您字符串的起始地址,然后在每个字符移动后递增该寄存器。 It acts like a pointer in C. You will need to have some sort of indication for the termination of the string or another value that tells the size of the string, so you know when to stop. 它的作用类似于C中的指针。您将需要某种指示字符串的终止或其他值来指示字符串的大小,以便知道何时停止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM