[英]Store an int in a char buffer in C and then retrieve the same
I am writing a socket client-server application where the server needs to send a large buffer to a client and all buffers should be processed separately, so I want to put the buffer length in the buffer so that the client can read the length of data from the buffer and process accordingly. 我正在编写一个套接字客户端 - 服务器应用程序,其中服务器需要向客户端发送一个大缓冲区,并且所有缓冲区都应该单独处理,所以我想将缓冲区长度放在缓冲区中,以便客户端可以读取数据长度从缓冲区和相应的过程。
To put the length value I need to divide an integer value in one byte each and store it in a buffer to be sent over the socket. 要设置长度值,我需要将每个字节中的整数值除以一个字节,并将其存储在缓冲区中以通过套接字发送。 I am able to break the integer into four parts, but at the time of joining I am not able to retrieve the correct value.
我能够将整数分成四个部分,但在加入时我无法检索到正确的值。 To demonstrate my problem I have written a sample program where I am dividing int into four char variables and then join it back in another integer.
为了演示我的问题,我编写了一个示例程序,我将int分成四个char变量,然后将它连接回另一个整数。 The goal is that after joining I should get the same result.
目标是加入后我应该得到相同的结果。
Here is my small program. 这是我的小程序。
#include <stdio.h>
int main ()
{
int inVal = 0, outVal =0;
char buf[5] = {0};
inVal = 67502978;
printf ("inVal: %d\n", inVal);
buf[0] = inVal & 0xff;
buf[1] = (inVal >> 8) & 0xff;
buf[2] = (inVal >> 16) & 0xff;
buf[3] = (inVal >> 24) & 0xff;
outVal = buf[3];
outVal = outVal << 8;
outVal |= buf[2];
outVal = outVal << 8;
outVal |= buf[1];
outVal = outVal << 8;
outVal |= buf[0];
printf ("outVal: %d\n",outVal);
return 0;
}
inVal: 67502978 outVal: -126 inVal:67502978 outVal:-126
What am I doing wrong? 我究竟做错了什么?
One problem is that you are using bit-wise operators on signed numbers. 一个问题是您在签名号码上使用逐位运算符。 This is always a bad idea and almost always incorrect.
这总是一个坏主意,几乎总是不正确的。 Please note that
char
has implementation-defined signedness, unlike int
which is always signed. 请注意,
char
具有实现定义的签名,与int
总是签名不同。
Therefore you should replace int
with uint32_t
and char
with uint8_t
. 因此,您应该使用
uint32_t
替换int
,并使用uint8_t
替换char
。 With such unsigned types you eliminate the possibility of using bit shifts on negative numbers, which would be a bug. 使用这种无符号类型可以消除在负数上使用位移的可能性,这可能是一个错误。 Similarly, if you shift data into the sign bits of a signed number, you will get bugs.
同样,如果将数据移入带符号数的符号位, 则会出现错误。
And needless to say, the code will not work if integers are not 4 bytes large. 不用说,如果整数不是4字节大,代码将不起作用。
Your method has potential implementation defined behavior as well as undefined behavior: 您的方法具有潜在的实现定义行为以及未定义的行为:
storing values into the array of type char
beyond the range of type char
has implementation defined behavior: buf[0] = inVal & 0xff;
存储值代入式的阵列
char
以外类型的范围char
已经实现定义行为: buf[0] = inVal & 0xff;
and the next 3 statements ( inVal & 0xff
might be larger than CHAR_MAX
if char
type is signed by default). 和接下来的3个语句(如果默认情况下对
char
类型进行签名,则inVal & 0xff
可能大于CHAR_MAX
)。
left shifting negative values invokes undefined behavior: if any of the 3 first bytes in the array becomes negative as the implementation defined result of storing a value larger than CHAR_MAX
into it, the resulting outVal
becomes negative, left shifting it is undefined. 左移负值调用未定义的行为:如果数组中的3个第一个字节中的任何一个变为负数,因为实现定义了将大于
CHAR_MAX
的值存储到其中的结果,则得到的outVal
变为负数,左移它是未定义的。
In your specific example, your architecture uses 2's complement representation for negative values and the type char
is signed. 在您的特定示例中,您的体系结构使用2的补码表示形式为负值,并且
char
类型已签名。 The value stored into buf[0]
is 67502978 & 0xff = 130
, becomes -126
. 存储在
buf[0]
的值是67502978 & 0xff = 130
,变为-126
。 The last statement outVal |= buf[0];
最后一个语句
outVal |= buf[0];
sets bits 7 through 31 of outVal
and the result is -126
. 设置
outVal
第7位到第31位,结果为-126
。
You can avoid these issues by using an array of unsigned char
and values of type unsigned int
: 您可以通过使用
unsigned char
数组和unsigned int
类型的值来避免这些问题:
#include <stdio.h>
int main(void) {
unsigned int inVal = 0, outVal = 0;
unsigned char buf[4] = { 0 };
inVal = 67502978;
printf("inVal: %u\n", inVal);
buf[0] = inVal & 0xff;
buf[1] = (inVal >> 8) & 0xff;
buf[2] = (inVal >> 16) & 0xff;
buf[3] = (inVal >> 24) & 0xff;
outVal = buf[3];
outVal <<= 8;
outVal |= buf[2];
outVal <<= 8;
outVal |= buf[1];
outVal <<= 8;
outVal |= buf[0];
printf("outVal: %u\n", outVal);
return 0;
}
Note that the above code still assumes 32-bit ints. 请注意,上面的代码仍假定为32位整数。
While bit shifts of signed values can be a problem, this is not the case here (all left hand values are positive, and all results are within the range of a 32 bit unsigned int). 虽然有符号值的位移可能是个问题,但这不是这种情况(所有左手值都是正数,并且所有结果都在32位无符号整数范围内)。
The problematic expression with somewhat unintuitive semantics is the last bitwise OR: 有些不直观的语义的有问题的表达式是最后的按位OR:
outVal |= buf[0];
buf[0]
is a (on your and my architecture) signed char with the value -126, simply because the most significant bit in the least significant byte of 67502978 is set. buf[0]
是一个(在您和我的架构上)带有值-126的有符号字符,只是因为67502978的最低有效字节中的最高有效位被置位。 In C all operands in an arithmetic expression are subject to the arithmetic conversions. 在C中,算术表达式中的所有操作数都受算术转换的影响。 Specifically, they undergo integer promotion which states: "If an int can represent all values of the original type [...], the value is converted to an int".
具体来说,它们经历整数提升 ,声明:“如果int可以表示原始类型的所有值[...],则该值将转换为int”。 Accordingly, the signed character
buf[0]
is converted to a (signed) int
, preserving its value of -126. 因此,签名字符
buf[0]
被转换为(带符号) int
, 保留其值-126。 A negative signed int has the sign bit set. 负的signed int具有符号位设置。 ORing that with another signed int sets the result's sign bit as well, making that value negative.
与另一个signed int进行ORing也会设置结果的符号位,使该值为负值。 That is exactly what we are seeing.
这正是我们所看到的。
Making the bytes unsigned char
s fixes the issue because the value of the temporary integer to which the unsigned char is converted is then a simple 8 bit value of 130. 使字节
unsigned char
s解决了这个问题,因为转换unsigned char的临时整数的值是一个简单的8位值130。
Use unsigned char buf[5] = {0};
使用
unsigned char buf[5] = {0};
and unsigned int
for inVal
and outVal
, and it should work. 和
inVal
和outVal
unsigned int
,它应该工作。
When using signed integral types, there arise two sorts of problems: 使用有符号整数类型时,会出现两种问题:
First, if buf[3]
is negative, then due to outVal = buf[3]
variable outVal
becomes negative; 首先,如果
buf[3]
是负数,那么由于outVal = buf[3]
变量outVal
变为负数; consequent bit shift operators on outVal
are then undefined behaviour cppreference.com concerning bit shift operators : 然后
outVal
上的outVal
位移运算符是关于位移运算符的未定义行为cppreference.com :
For signed and positive a, the value of a << b is a * 2b if it is representable the return type, otherwise the behavior is undefined.
对于有符号和正数a,如果可以表示返回类型,则<< b的值是* 2b,否则行为是未定义的。 (until C++14), the value of a << b is a * 2b if it is representable in the unsigned version of the return type (which is then converted to signed: this makes it legal to create INT_MIN as 1<<31), otherwise the behavior is undefined.
(直到C ++ 14),如果它在返回类型的无符号版本中可表示,则<< b的值为* 2b;然后将其转换为signed:这使得将INT_MIN创建为1是合法的<< 31),否则行为未定义。 (since C++14)
(自C ++ 14起)
For negative a, the behavior of a << b is undefined.
对于负数a,<< b的行为未定义。
Note that with OP's inVal = 67502978
this does not occur, since buf[3]=4
; 注意,当OP的
inVal = 67502978
这不会发生,因为buf[3]=4
; But for other inVal
s it may occur and then may bring problems due to "undefined behaviour". 但是对于其他
inVal
它可能会发生,然后可能会因“未定义的行为”而带来问题。
The second problem is that with operation outVal |= buf[0]
with buf[0]=-126
, the value (char)-126
, which in binary format is 10000010
, is converted to (int)-126
, which in binary format is 11111111111111111111111110000010
before operator |=
is applied, and this then will fill up outVal
with a lot of 1
-bits. 第二个问题是使用
buf[0]=-126
操作outVal |= buf[0]
时,二进制格式为10000010
的值(char)-126
将转换为(int)-126
,二进制在运行operator |=
之前格式为11111111111111111111111110000010
,然后用大量的1
填充outVal
。 The reason for conversion is defined at conversion rules for arithmetic operations (cppreference.com) : 转换的原因是在算术运算的转换规则(cppreference.com)中定义的 :
If both operands are signed or both are unsigned, the operand with lesser conversion rank is converted to the operand with the greater integer conversion rank
如果两个操作数都是有符号的或两者都是无符号的,则具有较小转换等级的操作数将转换为具有较大整数转换等级的操作数
So the problem in OP's case is actually not because of any undefined behaviour, but because of having character buf[3]
being a negative value, which is converted to int
before |=
operation. 所以OP的情况实际上并不是因为任何未定义的行为,而是因为字符
buf[3]
是负值,在|=
操作之前转换为int
。
Note, however, that if either buf[2]
or buf[1]
had been negative, this would have made outVal
negative and would have lead to undefined behaviour on subsequent shift operations, too. 但请注意,如果
buf[2]
或buf[1]
为负数,则会使outVal
负,并且会导致后续移位操作中的未定义行为。
C++ standard N3936 quotes about shift operators: 关于移位运算符的C ++标准N3936引用:
The value of
E1 << E2
isE1
left-shiftedE2
bit positions;E1 << E2
的值是E1
左移E2
位位置; vacated bits are zero-filled.空位是零填充的。
If
E1
has an unsigned type ,如果
E1
具有无符号类型 ,the value of the result is
E1 × 2^E2
, reduced modulo one more than the maximum value representable in the result type.结果的值是
E1 × 2^E2
,比结果类型中可表示的最大值减少一个模数。Otherwise, if
E1
has a signed type and non-negative value,否则,如果
E1
具有有符号类型和非负值,and
E1 × 2^E2
is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value;并且
E1 × 2^E2
可以在结果类型的相应无符号类型中表示,然后转换为结果类型的该值是结果值; otherwise, the behavior is undefined .否则,行为未定义 。
So, to avoid undefined
behaviour, it is recommended to use unsigned
data types, and ensure the 64-bits
length of data type. 因此,为避免
undefined
行为,建议使用unsigned
数据类型,并确保64-bits
长度的数据类型。
This may be a terrible idea but I'll post it here for interest - you can use a union : 这可能是一个可怕的想法,但我会在这里发布它的兴趣 - 你可以使用一个联盟 :
union my_data
{
uint32_t one_int;
struct
{
uint8_t byte3;
uint8_t byte2;
uint8_t byte1;
uint8_t byte0;
}bytes;
};
// Your original code modified to use union my_data
#include <stdio.h>
int main(void) {
union my_data data;
uint32_t inVal = 0, outVal = 0;
uint8_t buf[4] = {0};
inVal = 67502978;
printf("inVal: %u\n", inVal);
data.one_int = inVal;
// Populate bytes into buff
buf[3] = data.bytes.byte3;
buf[2] = data.bytes.byte2;
buf[1] = data.bytes.byte1;
buf[0] = data.bytes.byte0;
return 0;
}
I don't know if this would also work, can't see why not: 我不知道这是否也有效,不明白为什么不:
union my_data
{
uint32_t one_int;
uint8_t bytes[4];
};
Because of endian differences between architectures, it is best practice to convert numeric values to network order
, which is big-endian. 由于架构之间存在字节序差异,因此最佳做法是将数值转换为
network order
,这是大端的。 On receipt, they can then be converted to the native host order. 收到后,可以将它们转换为本机主机订单。 We can do this in a portable way by using
htonl()
(host to network "long" = uint32_t), and convert to host order on receipt with ntohl()
. 我们可以通过使用
htonl()
(主机到网络“long”= uint32_t)以便携式方式执行此操作,并在收到时使用ntohl()
转换为主机顺序。 Example: 例:
#include <stdio.h>
#include <arpa/inet.h>
int main(int argc, char **argv) {
uint32_t inval = 67502978, outval, backinval;
outval = htonl(inval);
printf("outval: %d\n", outval);
backinval = ntohl(outval);
printf("backinval: %d\n", backinval);
return 0;
}
This gives the following result on my 64 bit x86 which is little endian: 这在我的64位x86上得到以下结果,这是小端:
$ gcc -Wall example.c
$ ./a.out
outval: -2113731068
backinval: 67502978
$
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.