如何安全地将 uint32_t 中的有符号字段提取为有符号数（int 或 uint32_t）

Question

I have a project in which I am getting a vector of 32-bit ARM instructions, and a part of the instructions (offset values) needs to be read as signed (two's complement) numbers instead of unsigned numbers.我有一个项目，其中我得到一个 32 位 ARM 指令的向量，并且需要将部分指令（偏移值）读取为有符号（二进制补码）数字而不是无符号数字。

I used a uint32_t vector because all the opcodes and registers are read as unsigned and the whole instruction was 32-bits.我使用了uint32_t向量，因为所有操作码和寄存器都被读取为无符号并且整个指令是 32 位的。

For example:例如：

I have this 32-bit ARM instruction encoding:我有这个 32 位 ARM 指令编码：

uint32_t addr = 0b00110001010111111111111111110110

The last 19 bits are the offset of the branch that I need to read as signed integer branch displacement.最后 19 位是我需要读取为带符号 integer 分支位移的分支的偏移量。 This part: 1111111111111110110本部分：1111111111111110110

I have this function in which the parameter is the whole 32-bit instruction: I am shifting left 13 places and then right 13 places again to have only the offset value and move the other part of the instruction.我有这个 function ，其中参数是整个 32 位指令：我向左移动 13 位，然后再次向右移动 13 位以仅具有偏移值并移动指令的另一部分。

I have tried this function casting to different signed variables, using different ways of casting and using other c++ functions, but it prints the number as it was unsigned.我已经尝试将这个 function 转换为不同的有符号变量，使用不同的转换方式和其他 c++ 函数，但它会打印未签名的数字。

int getCat1BrOff(uint32_t inst)
{
    uint32_t temp = inst << 13;
    uint32_t brOff = temp >> 13;
    return (int)brOff;
}

I get decimal number 524278 instead of -10 .我得到十进制数524278 而不是 -10 。

The last option that I think is not the best one, but it may work is to set all the binary values in a string.我认为最后一个选项不是最好的，但它可能有效的是在字符串中设置所有二进制值。 Invert the bits and add 1 to convert them and then convert back the new binary number into decimal.反转位并加 1 以转换它们，然后将新的二进制数转换回十进制。 As I would of do it in a paper, but it is not a good solution.正如我在论文中所做的那样，但这不是一个好的解决方案。

Answer 1

It boils down to doing a sign extension where the sign bit is the 19th one.它归结为进行符号扩展，其中符号位是第 19 个。 There are two ways.有两种方法。

Use arithmetic shifts.使用算术移位。
Detect sign bit and or with ones at high bits.检测符号位和或与高位。

There is no portable way to do 1. in C++. 1. 在 C++ 中没有可移植的方法。 But it can be checked on compilation time.但可以检查编译时间。 Please correct me if the code below is UB, but I believe it is only implementation defined - for which we check at compile time.如果下面的代码是 UB，请纠正我，但我相信它只是定义的实现 - 我们在编译时检查它。 The only questionable thing is conversion of unsigned to signed which overflows, and the right shift, but that should be implementation defined.唯一值得怀疑的事情是将无符号转换为有符号溢出，以及右移，但这应该是实现定义的。

int getCat1BrOff(uint32_t inst)
{
    if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
    {
        return int32_t(inst << uint32_t{13}) >> int32_t{13};
    }
    else
    {
        int32_t offset = inst & 0x0007FFFF;
        if (offset & 0x00040000)
        {
            offset |= 0xFFF80000;
        }
        return offset;
    }
}

or a more generic solution或更通用的解决方案

template <uint32_t N>
int32_t signExtend(uint32_t value)
{
    static_assert(N > 0 && N <= 32);
    constexpr uint32_t unusedBits = (uint32_t(32) - N);
    if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
    {
        return int32_t(value << unusedBits) >> int32_t(unusedBits);
    }
    else
    {
        constexpr uint32_t mask = uint32_t(0xFFFFFFFFu) >> unusedBits;
        value &= mask;
        if (value & (uint32_t(1) << (N-1)))
        {
            value |= ~mask;
        }
        return int32_t(value);
    }
}

https://godbolt.org/z/rb-rRB https://godbolt.org/z/rb-rRB

Answer 2

In practice, you just need to declare temp as signed:在实践中，您只需将temp声明为已签名：

int getCat1BrOff(uint32_t inst)
{
    int32_t temp = inst << 13;
    return temp >> 13;
}

Unfortunately this is not portable :不幸的是，这不是便携式的：

For negative a, the value of a >> b is implementation-defined (in most implementations, this performs arithmetic right shift, so that the result remains negative).对于负 a，a >> b 的值是实现定义的（在大多数实现中，这会执行算术右移，因此结果仍然为负）。

But I have yet to meet a compiler that doesn't do the obvious thing here.但是我还没有遇到一个在这里不做明显事情的编译器。

Answer 3

Here is how I would do:以下是我的做法：

#include <bitset>
#include <cstddef>
#include <iostream>

int main()
{
  uint32_t addr = 0b00110001010111111111111111110110;
  uint32_t mask = 0b00000000000011111111111111111111;

  int result = ~((addr & mask) ^ mask); // <- here it is

  std::cout << std::bitset<32>(addr) << std::endl
            << std::bitset<32>(mask) << std::endl
            << std::bitset<32>(result) << std::endl
            << "result " << result << std::endl;
}

which prints:打印：

00110001010111111111111111110110
00000000000011111111111111111111
11111111111111111111111111110110
result -10

如何安全地将 uint32_t 中的有符号字段提取为有符号数（int 或 uint32_t）

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-11-17 22:26:44

解决方案2
1 2019-11-17 22:39:42

解决方案3
0 2019-11-17 19:42:57

如何安全地将 uint32_t 中的有符号字段提取为有符号数（int 或 uint32_t）

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-11-17 22:26:44

解决方案2 1 2019-11-17 22:39:42

解决方案3 0 2019-11-17 19:42:57

解决方案1
2 已采纳 2019-11-17 22:26:44

解决方案2
1 2019-11-17 22:39:42

解决方案3
0 2019-11-17 19:42:57