简体   繁体   English

将int存储在C中的char缓冲区中,然后检索它

[英]Store an int in a char buffer in C and then retrieve the same

I am writing a socket client-server application where the server needs to send a large buffer to a client and all buffers should be processed separately, so I want to put the buffer length in the buffer so that the client can read the length of data from the buffer and process accordingly. 我正在编写一个套接字客户端 - 服务器应用程序,其中服务器需要向客户端发送一个大缓冲区,并且所有缓冲区都应该单独处理,所以我想将缓冲区长度放在缓冲区中,以便客户端可以读取数据长度从缓冲区和相应的过程。

To put the length value I need to divide an integer value in one byte each and store it in a buffer to be sent over the socket. 要设置长度值,我需要将每个字节中的整数值除以一个字节,并将其存储在缓冲区中以通过套接字发送。 I am able to break the integer into four parts, but at the time of joining I am not able to retrieve the correct value. 我能够将整数分成四个部分,但在加入时我无法检索到正确的值。 To demonstrate my problem I have written a sample program where I am dividing int into four char variables and then join it back in another integer. 为了演示我的问题,我编写了一个示例程序,我将int分成四个char变量,然后将它连接回另一个整数。 The goal is that after joining I should get the same result. 目标是加入后我应该得到相同的结果。

Here is my small program. 这是我的小程序。

#include <stdio.h>

int main ()
{
    int inVal = 0, outVal =0;
    char buf[5] = {0};

    inVal = 67502978;

    printf ("inVal: %d\n", inVal);

    buf[0] = inVal & 0xff;
    buf[1] = (inVal >> 8) & 0xff;
    buf[2] = (inVal >> 16) & 0xff;
    buf[3] = (inVal >> 24) & 0xff;

    outVal = buf[3];
    outVal = outVal << 8;
    outVal |= buf[2];
    outVal = outVal << 8;
    outVal |= buf[1];
    outVal = outVal << 8;
    outVal |= buf[0];

    printf ("outVal: %d\n",outVal);
    return 0;
}

Output 产量

inVal: 67502978 outVal: -126 inVal:67502978 outVal:-126

What am I doing wrong? 我究竟做错了什么?

One problem is that you are using bit-wise operators on signed numbers. 一个问题是您在签名号码上使用逐位运算符。 This is always a bad idea and almost always incorrect. 这总是一个坏主意,几乎总是不正确的。 Please note that char has implementation-defined signedness, unlike int which is always signed. 请注意, char具有实现定义的签名,与int总是签名不同。

Therefore you should replace int with uint32_t and char with uint8_t . 因此,您应该使用uint32_t替换int ,并使用uint8_t替换char With such unsigned types you eliminate the possibility of using bit shifts on negative numbers, which would be a bug. 使用这种无符号类型可以消除在负数上使用位移的可能性,这可能是一个错误。 Similarly, if you shift data into the sign bits of a signed number, you will get bugs. 同样,如果将数据移入带符号数的符号位, 则会出现错误。

And needless to say, the code will not work if integers are not 4 bytes large. 不用说,如果整数不是4字节大,代码将不起作用。

Your method has potential implementation defined behavior as well as undefined behavior: 您的方法具有潜在的实现定义行为以及未定义的行为:

  • storing values into the array of type char beyond the range of type char has implementation defined behavior: buf[0] = inVal & 0xff; 存储值代入式的阵列char以外类型的范围char已经实现定义行为: buf[0] = inVal & 0xff; and the next 3 statements ( inVal & 0xff might be larger than CHAR_MAX if char type is signed by default). 和接下来的3个语句(如果默认情况下对char类型进行签名,则inVal & 0xff可能大于CHAR_MAX )。

  • left shifting negative values invokes undefined behavior: if any of the 3 first bytes in the array becomes negative as the implementation defined result of storing a value larger than CHAR_MAX into it, the resulting outVal becomes negative, left shifting it is undefined. 左移负值调用未定义的行为:如果数组中的3个第一个字节中的任何一个变为负数,因为实现定义了将大于CHAR_MAX的值存储到其中的结果,则得到的outVal变为负数,左移它是未定义的。

In your specific example, your architecture uses 2's complement representation for negative values and the type char is signed. 在您的特定示例中,您的体系结构使用2的补码表示形式为负值,并且char类型已签名。 The value stored into buf[0] is 67502978 & 0xff = 130 , becomes -126 . 存储在buf[0]的值是67502978 & 0xff = 130 ,变为-126 The last statement outVal |= buf[0]; 最后一个语句outVal |= buf[0]; sets bits 7 through 31 of outVal and the result is -126 . 设置outVal第7位到第31位,结果为-126

You can avoid these issues by using an array of unsigned char and values of type unsigned int : 您可以通过使用unsigned char数组和unsigned int类型的值来避免这些问题:

#include <stdio.h>

int main(void) {
    unsigned int inVal = 0, outVal = 0;
    unsigned char buf[4] = { 0 };

    inVal = 67502978;

    printf("inVal: %u\n", inVal);

    buf[0] = inVal & 0xff;
    buf[1] = (inVal >> 8) & 0xff;
    buf[2] = (inVal >> 16) & 0xff;
    buf[3] = (inVal >> 24) & 0xff;

    outVal = buf[3];
    outVal <<= 8;
    outVal |= buf[2];
    outVal <<= 8;
    outVal |= buf[1];
    outVal <<= 8;
    outVal |= buf[0];

    printf("outVal: %u\n", outVal);
    return 0;
}

Note that the above code still assumes 32-bit ints. 请注意,上面的代码仍假定为32位整数。

While bit shifts of signed values can be a problem, this is not the case here (all left hand values are positive, and all results are within the range of a 32 bit unsigned int). 虽然有符号值的位移可能是个问题,但这不是这种情况(所有左手值都是正数,并且所有结果都在32位无符号整数范围内)。

The problematic expression with somewhat unintuitive semantics is the last bitwise OR: 有些不直观的语义的有问题的表达式是最后的按位OR:

outVal |= buf[0];

buf[0] is a (on your and my architecture) signed char with the value -126, simply because the most significant bit in the least significant byte of 67502978 is set. buf[0]是一个(在您和我的架构上)带有值-126的有符号字符,只是因为67502978的最低有效字节中的最高有效位被置位。 In C all operands in an arithmetic expression are subject to the arithmetic conversions. 在C中,算术表达式中的所有操作数都受算术转换的影响。 Specifically, they undergo integer promotion which states: "If an int can represent all values of the original type [...], the value is converted to an int". 具体来说,它们经历整数提升 ,声明:“如果int可以表示原始类型的所有值[...],则该值将转换为int”。 Accordingly, the signed character buf[0] is converted to a (signed) int , preserving its value of -126. 因此,签名字符buf[0]被转换为(带符号) int保留其值-126。 A negative signed int has the sign bit set. 负的signed int具有符号位设置。 ORing that with another signed int sets the result's sign bit as well, making that value negative. 与另一个signed int进行ORing也会设置结果的符号位,使该值为负值。 That is exactly what we are seeing. 这正是我们所看到的。

Making the bytes unsigned char s fixes the issue because the value of the temporary integer to which the unsigned char is converted is then a simple 8 bit value of 130. 使字节unsigned char s解决了这个问题,因为转换unsigned char的临时整数的值是一个简单的8位值130。

Use unsigned char buf[5] = {0}; 使用unsigned char buf[5] = {0}; and unsigned int for inVal and outVal , and it should work. inValoutVal unsigned int ,它应该工作。

When using signed integral types, there arise two sorts of problems: 使用有符号整数类型时,会出现两种问题:

First, if buf[3] is negative, then due to outVal = buf[3] variable outVal becomes negative; 首先,如果buf[3]是负数,那么由于outVal = buf[3]变量outVal变为负数; consequent bit shift operators on outVal are then undefined behaviour cppreference.com concerning bit shift operators : 然后outVal上的outVal位移运算符是关于位移运算符的未定义行为cppreference.com

For signed and positive a, the value of a << b is a * 2b if it is representable the return type, otherwise the behavior is undefined. 对于有符号和正数a,如果可以表示返回类型,则<< b的值是* 2b,否则行为是未定义的。 (until C++14), the value of a << b is a * 2b if it is representable in the unsigned version of the return type (which is then converted to signed: this makes it legal to create INT_MIN as 1<<31), otherwise the behavior is undefined. (直到C ++ 14),如果它在返回类型的无符号版本中可表示,则<< b的值为* 2b;然后将其转换为signed:这使得将INT_MIN创建为1是合法的<< 31),否则行为未定义。 (since C++14) (自C ++ 14起)

For negative a, the behavior of a << b is undefined. 对于负数a,<< b的行为未定义。

Note that with OP's inVal = 67502978 this does not occur, since buf[3]=4 ; 注意,当OP的inVal = 67502978这不会发生,因为buf[3]=4 ; But for other inVal s it may occur and then may bring problems due to "undefined behaviour". 但是对于其他inVal它可能会发生,然后可能会因“未定义的行为”而带来问题。

The second problem is that with operation outVal |= buf[0] with buf[0]=-126 , the value (char)-126 , which in binary format is 10000010 , is converted to (int)-126 , which in binary format is 11111111111111111111111110000010 before operator |= is applied, and this then will fill up outVal with a lot of 1 -bits. 第二个问题是使用buf[0]=-126操作outVal |= buf[0]时,二进制格式为10000010的值(char)-126将转换为(int)-126 ,二进制在运行operator |=之前格式为11111111111111111111111110000010 ,然后用大量的1填充outVal The reason for conversion is defined at conversion rules for arithmetic operations (cppreference.com) : 转换的原因是在算术运算的转换规则(cppreference.com)中定义的

If both operands are signed or both are unsigned, the operand with lesser conversion rank is converted to the operand with the greater integer conversion rank 如果两个操作数都是有符号的或两者都是无符号的,则具有较小转换等级的操作数将转换为具有较大整数转换等级的操作数

So the problem in OP's case is actually not because of any undefined behaviour, but because of having character buf[3] being a negative value, which is converted to int before |= operation. 所以OP的情况实际上并不是因为任何未定义的行为,而是因为字符buf[3]是负值,在|=操作之前转换为int

Note, however, that if either buf[2] or buf[1] had been negative, this would have made outVal negative and would have lead to undefined behaviour on subsequent shift operations, too. 但请注意,如果buf[2]buf[1]为负数,则会使outVal负,并且会导致后续移位操作中的未定义行为。

C++ standard N3936 quotes about shift operators: 关于移位运算符的C ++标准N3936引用:

The value of E1 << E2 is E1 left-shifted E2 bit positions; E1 << E2的值是E1左移E2位位置; vacated bits are zero-filled. 空位是零填充的。

If E1 has an unsigned type , 如果E1具有无符号类型

the value of the result is E1 × 2^E2 , reduced modulo one more than the maximum value representable in the result type. 结果的值是E1 × 2^E2 ,比结果类型中可表示的最大值减少一个模数。

Otherwise, if E1 has a signed type and non-negative value, 否则,如果E1具有有符号类型和非负值,

and E1 × 2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; 并且E1 × 2^E2可以在结果类型的相应无符号类型中表示,然后转换为结果类型的该值是结果值; otherwise, the behavior is undefined . 否则,行为未定义

So, to avoid undefined behaviour, it is recommended to use unsigned data types, and ensure the 64-bits length of data type. 因此,为避免undefined行为,建议使用unsigned数据类型,并确保64-bits长度的数据类型。

This may be a terrible idea but I'll post it here for interest - you can use a union : 这可能是一个可怕的想法,但我会在这里发布它的兴趣 - 你可以使用一个联盟

union my_data
{
    uint32_t one_int;

    struct
    {
        uint8_t  byte3;
        uint8_t  byte2;
        uint8_t  byte1;
        uint8_t  byte0;
    }bytes;
};


// Your original code modified to use union my_data
#include <stdio.h>

int main(void) {
    union my_data data;
    uint32_t inVal = 0, outVal = 0;
    uint8_t buf[4] = {0};

    inVal = 67502978;

    printf("inVal: %u\n", inVal);

    data.one_int = inVal;

    // Populate bytes into buff    
    buf[3] = data.bytes.byte3;
    buf[2] = data.bytes.byte2;
    buf[1] = data.bytes.byte1;
    buf[0] = data.bytes.byte0;

    return 0;
}

I don't know if this would also work, can't see why not: 我不知道这是否也有效,不明白为什么不:

union my_data
{
    uint32_t one_int;
    uint8_t  bytes[4];
};

Because of endian differences between architectures, it is best practice to convert numeric values to network order , which is big-endian. 由于架构之间存在字节序差异,因此最佳做法是将数值转换为network order ,这是大端的。 On receipt, they can then be converted to the native host order. 收到后,可以将它们转换为本机主机订单。 We can do this in a portable way by using htonl() (host to network "long" = uint32_t), and convert to host order on receipt with ntohl() . 我们可以通过使用htonl() (主机到网络“long”= uint32_t)以便携式方式执行此操作,并在收到时使用ntohl()转换为主机顺序。 Example: 例:

#include <stdio.h>
#include <arpa/inet.h>

int main(int argc, char **argv) {
  uint32_t inval = 67502978, outval, backinval;

  outval = htonl(inval);
  printf("outval: %d\n", outval);
  backinval = ntohl(outval);
  printf("backinval: %d\n", backinval);
  return 0;
}

This gives the following result on my 64 bit x86 which is little endian: 这在我的64位x86上得到以下结果,这是小端:

$ gcc -Wall example.c
$ ./a.out
outval: -2113731068
backinval: 67502978
$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM