简体   繁体   English

设置寄存器为1或(-1)的最有效方法

[英]Most Efficient way to set Register to 1 or (-1)

I am taking an assembly course now, and the guy who checks our home assignments is a very pedantic old-school optimization freak. 我现在正在参加一个装配课程,那个检查我们家庭作业的人是一个非常迂腐的老派优化狂。 For example he deducts 10% if he sees: 例如,如果他看到,他会扣除10%:

mov ax, 0

instead of: 代替:

xor ax,ax

even if it's only used once. 即使它只使用一次。

I am not a complete beginner in assembly programing but I'm not an optimization expert, so I need your help in something (might be a very stupid question but I'll ask anyway): if I need to set a register value to 1 or (-1) is it better to use: 我不是一个完整的汇编程序初学者,但我不是一个优化专家,所以我需要你的帮助(可能是一个非常愚蠢的问题,但无论如何我都会问):如果我需要将寄存器值设置为1或(-1)最好使用:

mov ax, 1

or do something like: 或做类似的事情:

xor ax,ax
inc ax

I really need a good grade, so I'm trying to get it as optimized as possible. 我真的需要一个好成绩,所以我试图让它尽可能优化。 ( I need to optimize both time and code size) (我需要优化时间和代码大小)

A quick google for 8086 instructions timings size turned up http://8086.tk/ which seems to have all the timings and sizes for the 8086 (and more) instruction sets. 一个快速谷歌8086 instructions timings size出现http://8086.tk/似乎有8086(和更多)指令集的所有时间和大小。

No doubt you could find official Intel doco on the web with similar information. 毫无疑问,您可以在网上找到具有类似信息的官方英特尔doco。

For your specific question: 针对您的具体问题:

xor ax,ax
inc ax

takes 3+3=6 clock cycles and 2+1=3 bytes while 需要3 + 3 = 6个时钟周期和2 + 1 = 3个字节

mov ax,1

takes 4 clock cycles and 3 bytes. 需要4个时钟周期和3个字节。

So the latter is better in that case. 所以后者在这种情况下更好。


But you need to talk to your educational institute about this guy. 但你需要和你的教育机构谈谈这个人。 10% for a simple thing like that beggars belief. 对于像乞丐信仰那样简单的事情,这个10%。

You should ask what should be done in the case where you have two possibilities, one faster and one shorter. 你应该问在你有两种可能性的情况下应该做些什么,一种更快,一种更短。

Then, once they've admitted that there are different ways to code depending on what you're trying to achieve, tell them that what you're trying to achieve is readability and maintainability and seriously couldn't give a flying leap about a wasted cycle or byte here or there *a . 然后,一旦他们承认根据你想要实现的目标有不同的编码方式,告诉他们你想要实现的是可读性和可维护性,并且严重无法实现浪费的飞跃循环或字节在这里或那里* a

Optimisation is something you generally do if and when you have a performance problem, after a piece of code is in a near-complete state - it's almost always wasted effort when the code is still subject to a not-insignificant likelihood of change. 在一段代码处于接近完成状态之后,如果遇到性能问题,通常会执行优化 - 当代码仍然受到无关紧要的变更可能性时,几乎总是浪费精力。

For what it's worth, sub ax,ax appears to be on par with xor ax,ax in terms of clock cycles and bytes, so maybe you could throw that into the mix next time to cause him some more work. 对于它的价值, sub ax,ax似乎与xor ax,ax在时钟周期和字节方面相当,所以也许你可以在下次将它扔到混合中以使他做更多的工作。

*a) No, don't really, but it's fun to vent occasionally :-) * a)不,不是真的,偶尔发泄也很有趣:-)

You're better off with 你最好的

mov AX,1 mov AX,1

on the 8086. If you're tracking register contents, you can possibly do better if you know that, for example, BX already has a 1 in it: 如果你正在跟踪注册内容,你可以做得更好,如果你知道,例如,BX已经有一个1:

mov AX,BX mov AX,BX

or if you know that AH is 0: 或者如果你知道AH为0:

mov AL,1 mov AL,1

etc. 等等

Depending upon your circumstances, you may be able to get away with ... 根据您的具体情况,您可能会逃脱......

 sbb ax, ax

The result will either be 0 if the carry flag is not set or -1 if the carry flag is set. 如果未设置进位标志,结果将为0;如果进位标志置位,则结果为-1。

However, if the above example is not applicable to your situation, I would recommend the 但是,如果以上示例不适用于您的情况,我会建议

xor  ax, ax
inc  ax

method. 方法。 It should satisfy your professor for size. 它应该满足你的教授的规模。 However, if your processor employs any pipe-lining, I would expect there to be some coupling-like delay between the two instructions (I could very well be wrong on that). 但是,如果你的处理器使用任何管道衬里,我会期望在两个指令之间存在一些类似耦合的延迟(我很可能错误)。 If such a coupling exists, the speed could be improved slightly by reordering your instructions slightly to have another instruction between them (one that does not use ax). 如果存在这样的耦合,可以通过稍微重新排序指令以在它们之间进行另一条指令(不使用ax的指令)来略微提高速度。

Hope this helps. 希望这可以帮助。

I would use mov [e]ax, 1 under any circumstances. 在任何情况下我都会使用mov [e]ax, 1 Its encoding is no longer than the hackier xor sequence, and I'm pretty sure it's faster just about anywhere. 它的编码不再是hackier xor序列,我很确定它在任何地方都更快。 8086 is just weird enough to be the exception, and as that thing is so slow, a micro-optimization like this would make most difference. 8086很奇怪,只是异常,因为这个东西太慢了,像这样的微优化会产生最大的不同。 But any where else: executing 2 "easy" instructions will always be slower than executing 1, especially if you consider data hazards and long pipelines. 但是在任何其他地方:执行2个“简单”指令总是比执行1慢,特别是如果你考虑数据危险和长管道。 You're trying to read a register in the very next instruction after you modify it, so unless your CPU can bypass the result from stage N of the pipeline (where the xor is executing) to to stage N-1 (where the inc is trying to load the register, never mind adding 1 to its value), you're going to have stalls. 在修改它之后,你试图在下一条指令中读取一个寄存器,所以除非你的CPU可以绕过管道的第N阶段( xor执行的地方)到阶段N-1(其中inc是试图加载寄存器,不要在意它的值增加1),你就会有档位。

Other things to consider: instruction fetch bandwidth (moot for 16-bit code, both are 3 bytes); 其他需要考虑的事项:指令获取带宽(16位代码没有问题,两者都是3字节); mov avoids changing flags (more likely to be useful than forcing them all to zero); mov避免改变标志(比强迫它们全部为零更有用); depending on what values other registers might hold, you could perhaps do lea ax,[bx+1] (also 3 bytes, even in 32-bit code, no effect on flags); 根据其他寄存器可能包含的值,您可以执行lea ax,[bx+1] (也是3个字节,即使在32位代码中也不会对标志产生影响); as others have said, sbb ax,ax could work too in circumstances - it's also shorter at 2 bytes. 正如其他人所说的, sbb ax,ax在某些情况下也可以工作 - 它在2个字节时也更短。

When faced with these sorts of micro-optimizations you really should measure the alternatives instead of blindly relying even on processor manuals. 当面对这些微观优化时,你真的应该测量替代品,而不是盲目地依赖处理器手册。

PS New homework: is xor bx,bx any faster than xor bx,cx (on any processor)? PS新作业:是xor bx,bx是否比xor bx,cx (在任何处理器上)都快?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在ARM中设置寄存器的一位是最有效的方法是什么? - What is the most efficient way to set one bit of a register in ARM? 最有效/惯用的方法来测试256位YMM AVX寄存器为零 - Most efficient/idiomatic way to test a 256-bit YMM AVX register for zero 将最高有效设置位以下的所有位归零的最有效方法是什么? - What is the most efficient way to zero all bits below the most significant set bit? 在mips64上设置MMIO寄存器的第63位的最优化方法 - Most optimized way to set 63rd bit of an MMIO register on mips64 将float向量转换为uint32向量的最有效方法是什么? - Most efficient way to convert vector of float to vector of uint32? 使用SSE将4个浮点数乘以4个浮点数的最有效方法是什么? - What's the most efficient way to multiply 4 floats by 4 floats using SSE? 将uint32向量转换为float向量的最有效方法? - Most efficient way to convert vector of uint32 to vector of float? 有什么有效的方式来加载带有4个独立双打的x64 ymm寄存器? - What efficient way to load x64 ymm register with 4 seperated doubles? SSE将寄存器设置为0.0和1.0的最佳方法是什么? - SSE best way to set register to 0.0's and 1.0's? 在 Knights Landing 上清除单个或几个 ZMM 寄存器的最有效方法是什么? - What is the most efficient way to clear a single or a few ZMM registers on Knights Landing?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM