简体   繁体   中英

Reading two ints from an array as a long

I am working on a microcontroller project in which I have an array of unsigned ints that comes in from a communications interface. These are accessed through define macros for convenience.

I need to get sent some unsigned long values, instead of having to process two values from the comms register and shift them into a secondary long register, is it safe for me to use pointers and read two values out of the array at once.

I am interested in doing this as processing resources on the controller are quite limited. Is this safe, will array values always be contiguous in memory?

Example code

...

unsigned int comms[MAX_ADDRESS];

...

#define FOO             comms[0]
#define BAR             comms[1]
#define VAL_1           comms[2]
#define VAL_1_EXT       (*(unsigned long*)(&comms[2])) // Use pointer trickery to read a long
#define VAL_2           comms[4]
#define VAL_2_EXT       (*(unsigned long*)(&comms[4]))

...

Not sure if it is relevant but it is a chip from the MSP430 family from TI, compiler version TI 4.3.3

It depends what you mean by "safe." It's absolutely unsafe in the sense that the C Standard says nothing about what will happen because you are aliasing types with pointer casts. This is non-portable.

But non-portable doesn't mean non-functional. If the code is not for production use and you have good control over the development environment, you're likely to do fine with your proposal. The C Standard does guarantee that array elements are contiguous. If the compiler generates code that fetches the two (I'm guessing) 16-bit quantities from the commo registers to correctly form a 32-bit long in one instance, then it is virtually certain that:

  • It will do so in all usages.

  • Future compiler versions will do the same.

There are no guarantees, but in practice it's a reasonable bet.

To learn whether the code you're getting is correct, compile with -S and inspect. Write a good test to verify.

At any rate you have taken a good approach by isolating the access code in macros (though you should drop the semi-colons at the ends).

The following macro is well-defined with respect to the C Standard.

#define VAL_1_EXT       (((unsigned long)comms[3] << 16) | (unsigned long)comms[2])

If the you wrote

unsigned long x = VAL_1_EXT;

a good optimizing compiler should generate much the same code with the macro above as with your proposed one. I guess you're saying it's not a good optimizing compiler.

As pointed out in comments, this macro is not an l-value. You can't assign to it. For that you'll need a separate macro.

#define SET_VAL_1_EXT(Val) do { \
  unsigned long x = (unsigned long)Val;
  comms[2] = x; \
  comms[3] = (unsigned)(x >> 16); \
} while (0)

According to the standard, you have an aliasing bug, anything may happen.

The compiler is allowed to assume there is no aliasing between 16-bit int and 32-bit long types, and you might get surprising behavior (without warning) because you break that contract.

Just say no, use bit-shifting to compose your long from the two int s, and depend on the compiler to optimize that out for you (It should not really use bit-shifting under-the-hood). You might want to look at the assembly to determine whether it fails.

6.5 Expressions § 7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

As int and long are not compatible, and there is no exception, aliasing them is forbidden.

The more modern (and the better at optimizing) your compiler is, the more likely it is playing loose will bite you.

BTW: Most compilers implement many dialects, and GCC allows disabling of strict aliasing with -fno-strict-aliasing . Be sure not just to disable the warning but the actual optimizations.

If you wish to do this, are confident that sizeof(int)*2==sizeof(long) on your platform, and are content with this non-portability (because this assumption is non-portable) you can (and should) use a union to move back and forth between the two types in a defined manner.

union {
    int in [2];
    long out;
};

You may either store elements of this union type in your array, and write int s to in and read long s from out , or you can place int s from an int array into the union, and the read them out two at a time as a long .

Note that if you want more portability, you can use the integer types from <stdint.h> :

union {
    int32_t in [2];
    int64_t out;
};

Then the only platform-dependent behaviours will be:

  • How signed integers are represented
  • Endianness

Yes, this is safe, with the following assumptions:

  • The sender of this data is sending data as you're expecting. For example, comms[2] and comms[3] together do actually make up an unsigned long value, as you expect.

  • The sender's bit order (known as endianness ) and byte order are what you're expecting.

Per the subsequent comment on the question, the answer is no. My original answer explains why.


It depends on the whether you want completely safe and portable code, or are OK with code for a specific architecture, as well as on the endianess and order of the int s.

If you are OK with specific code, then...

Arrays in C are always consecutive memory locations and always packed, and a lot of code depends on this.

On a big endian system, if you have int s in the order

high-int,low-int

each int is

high-byte,low-byte

and the bytes in memory are

high-int-high,high-int-low,low-int-high,low-int-low

which you can then deference using a (long int*) cast. But not on a little endian system.

On a little endian system, if you have int s in the order

low-int,high-int

each int is

low-byte,high-byte

the bytes in memory are

low-int-low,low-int-high,high-int-low,high-int-high

which you can then deference using a (long int*) cast. But not on a big endian system.

I believe casting the unsigned int pointer to a unsigned long pointer will work on the MSP430 because the MSP430 is little endian AND the MSP430 does not require 4-byte longs to be aligned on 4-byte boundaries. But don't count on this working on another platform.

And don't expect that you can also cast two consecutive bytes to an unsigned int. The MSP430 requires that 2-byte words must be aligned on an even address. So if the first byte happens to be at an odd address then you will get undefined behavior when you cast it to a word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM