简体   繁体   English

读取CF,PF,ZF,SF,OF

[英]Reading CF, PF, ZF, SF, OF

I am writing a virtual machine for my own assembly language, I want to be able to set the carry, parity, zero, sign and overflowflags as they are set in the x86-64 architecture, when I perform operations such as addition. 我正在为自己的汇编语言编写虚拟机,我希望能够在执行诸如加法之类的操作时设置进位,奇偶校验,零,符号和溢出标志,因为它们是在x86-64体系结构中设置的。

Notes: 笔记:

  • I am using Microsoft Visual C++ 2015 & Intel C++ Compiler 16.0 我正在使用Microsoft Visual C ++ 2015和Intel C ++编译器16.0
  • I am compiling as a Win64 application. 我正在编译为Win64应用程序。
  • My virtual machine (currently) only does arithmetic on 8-bit integers 我的虚拟机(当前)仅对8位整数进行算术运算
  • I'm not (currently) interested in any other flags (eg AF) 我(目前)对其他标志(例如AF)不感兴趣

My current solution is using the following function: 我当前的解决方案使用以下功能:

void update_flags(uint16_t input)
{
    Registers::flags.carry = (input > UINT8_MAX);
    Registers::flags.zero = (input == 0);
    Registers::flags.sign = (input < 0);
    Registers::flags.overflow = (int16_t(input) > INT8_MAX || int16_t(input) < INT8_MIN);

    // I am assuming that overflow is handled by trunctation
    uint8_t input8 = uint8_t(input);
    // The parity flag
    int ones = 0;
    for (int i = 0; i < 8; ++i)
        if (input8 & (1 << i) != 0) ++ones;

    Registers::flags.parity = (ones % 2 == 0);
}

Which for addition, I would use as follows: 另外,我将使用以下方法:

uint8_t a, b;
update_flags(uint16_t(a) + uint16_t(b));
uint8_t c = a + b;

EDIT: To clarify, I want to know if there is a more efficient/neat way of doing this (such as by accessing RFLAGS directly) Also my code may not work for other operations (eg multiplication) 编辑:澄清一下,我想知道是否有一种更有效/更好的方法来做到这一点(例如,直接访问RFLAGS)我的代码也可能不适用于其他操作(例如乘法)

EDIT 2 I have updated my code now to this: 编辑2我现在将代码更新为:

void update_flags(uint32_t result)
{
    Registers::flags.carry = (result > UINT8_MAX);
    Registers::flags.zero = (result == 0);
    Registers::flags.sign = (int32_t(result) < 0);
    Registers::flags.overflow = (int32_t(result) > INT8_MAX || int32_t(result) < INT8_MIN);
    Registers::flags.parity = (_mm_popcnt_u32(uint8_t(result)) % 2 == 0);
}

One more question, will my code for the carry flag work properly?, I also want it to be set correctly for "borrows" that occur during subtraction. 还有一个问题,我的进位标志代码能否正常工作?,我也希望为减法期间发生的“借方”正确设置它。

Note: The assembly language I am virtualising is of my own design, meant to be simple and based of Intel's implementation of x86-64 (ie Intel64), and so I would like these flags to behave in mostly the same way. 注意:我正在虚拟化的汇编语言是我自己设计的,意味着很简单,并且基于Intel x86-64(即Intel64)的实现,因此我希望这些标志的行为方式大致相同。

TL:DR : use lazy flag evaluation, see below. TL:DR :使用惰性标志评估,请参见下文。


input is a weird name. input是一个奇怪的名字。 Most ISAs update flags based on the result of an operation, not the inputs. 大多数ISA根据操作结果而不是输入来更新标志。 You're looking at the 16bit result of an 8bit operation, which is an interesting approach. 您正在查看8位操作的16位结果,这是一种有趣的方法。 In the C, you should just use unsigned int , which is guaranteed to be at least uint16_t . 在C语言中,您应该只使用unsigned int ,它保证至少为uint16_t It will compile to better code on x86, where unsigned is 32bit. 它将在32位unsigned x86上编译为更好的代码。 16bit ops take an extra prefix and can lead to partial-register slowdowns. 16位运算符使用额外的前缀,并且可能导致部分寄存器变慢。

That might help with the 8bx8b->16b mul problem you noted, depending on how you want to define the flag-updating for the mul instruction in the architecture you're emulating. 这可能有助于解决您指出的8bx8b-> 16b mul问题,具体取决于您要如何为正在仿真的体系结构中的mul指令定义标志更新。

I don't think your overflow detection is correct. 我认为您的溢出检测不正确。 See this tutorial linked from the tag wiki for how it's done. 请参阅标签Wiki链接的本教程 ,了解其操作方法。


This will probably not compile to very fast code, especially the parity flag. 这可能不会编译为非常快的代码,尤其是奇偶校验标志。 Do you need the ISA you're emulating/designing to have a parity flag? 您是否需要仿真/设计的ISA具有奇偶校验标志? You never said you're emulating an x86, so I assume it's some toy architecture you're designing yourself. 您从未说过您正在仿真x86,所以我认为这是您自己设计的某种玩具体系结构。

An efficient emulator (esp. one that needs to support a parity flag) would probably benefit a lot from some kind of lazy flag evaluation . 高效的仿真器(尤其是需要支持奇偶校验标志的仿真器)可能会从某种惰性标志评估中受益匪浅。 Save a value that you can compute flags from if needed, but don't actually compute anything until you get to an instruction that reads flags. 保存一个值,您可以根据需要从中计算标志,但是在到达读取标志的指令之前,实际上不进行任何计算。 Most instructions only write flags without reading them, and they just save the uint16_t result into your architectural state. 大多数指令只写标志而不读标志,它们只是将uint16_t结果保存到架构状态。 Flag-reading instructions can either compute just the flag they need from that saved uint16_t , or compute all of them and store that somehow. 读标志指令可以从保存的uint16_t仅计算所需的标志,也可以计算所有标志并以某种方式存储。


Assuming you can't get the compiler to actually read PF from the result, you might try _mm_popcnt_u32((uint8_t)x) & 1 . 假设您无法使编译器从结果中实际读取PF ,则可以尝试_mm_popcnt_u32((uint8_t)x) & 1 Or, horizontally XOR all the bits together: 或者,对所有位进行水平异或运算:

x  = (x&0b00001111) ^ (x>>4)
x  = (x&0b00000011) ^ (x>>2)
PF = (x&0b00000001) ^ (x>>1)   // tweaking this to produce better asm is probably possible

I doubt any of the major compilers can peephole-optimize a bunch of checks on a result into LAHF + SETO al , or a PUSHF . 我怀疑任何主要的编译器上的结果进可窥视孔-优化一束检查LAHF + SETO alPUSHF Compilers can be led into using a flag condition to detect integer overflow to implement saturating addition, for example . 例如,可以导致编译器使用标志条件来检测整数溢出以实现饱和加法 But having it figure out that you want all the flags, and actually use LAHF instead of a series of setcc instruction, is probably not possible. 但是要弄清楚您想要所有标志,并实际上使用LAHF而不是一系列setcc指令,可能是不可能的。 The compiler would need a pattern-recognizer for when it can use LAHF , and probably nobody's implemented that because the use-cases are so vanishingly rare. 何时可以使用LAHF ,编译器将需要一个模式识别器,并且可能没有人实现,因为用例非常少见。

There's no C/C++ way to directly access flag results of an operation, which makes C a poor choice for implementing something like this. 没有C / C ++方法可以直接访问操作的标志结果,这使C成为实现此类目标的不佳选择。 IDK if any other languages do have flag results, other than asm. IDK(如果不是asm,则任何其他语言的确有标记结果)。

I expect you could gain a lot of performance by writing parts of the emulation in asm, but that would be platform-specific. 我希望您可以通过在asm中编写部分仿真来获得很多性能,但这将是特定于平台的。 More importantly, it's a lot more work. 更重要的是,还有很多工作要做。

I appear to have solved the problem, by splitting the arguments to update flags into an unsigned and signed result as follows: 我似乎已经解决了问题,方法是将参数更新参数拆分为一个未签名和已签名的结果,如下所示:

void update_flags(int16_t unsigned_result, int16_t signed_result)
{
    Registers::flags.zero = unsigned_result == 0;
    Registers::flags.sign = signed_result < 0;
    Registers::flags.carry = unsigned_result < 0 || unsigned_result > UINT8_MAX;
    Registers::flags.overflow = signed_result < INT8_MIN || signed_result > INT8_MAX
}

For addition (which should produce the correct result for both signed & unsigned inputs) I would do the following: 对于加法(对于带符号和无符号的输入都应产生正确的结果),我将执行以下操作:

int8_t a, b;
int16_t signed_result = int16_t(a) + int16_t(b);
int16_t unsigned_result = int16_t(uint8_t(a)) + int16_t(uint8_t(b));
update_flags(unsigned_result, signed_result);
int8_t c = a + b;

And signed multiplication I would do the following: 对于有符号乘法,我将执行以下操作:

int8_t a, b;
int16_t result = int16_t(a) * int16_t(b);
update_flags(result, result);
int8_t c = a * b;

And so on for the other operations that update the flags 以此类推,其他更新标志的操作

Note: I am assuming here that int16_t(a) sign extends, and int16_t(uint8_t(a)) zero extends. 注意:这里我假设int16_t(a)符号扩展,而int16_t(uint8_t(a))零扩展。

I have also decided against having a parity flag, my _mm_popcnt_u32 solution should work if I change my mind later.. 我还决定不使用奇偶校验标志,如果以后改变主意,我的_mm_popcnt_u32解决方案应该可以使用。

PS Thank you to everyone who responded, it was very helpful. PS:感谢所有回答的人,它非常有帮助。 Also if anyone can spot any mistakes in my code, that would be appreciated. 另外,如果任何人都可以在我的代码中发现任何错误,将不胜感激。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM