简体   繁体   中英

Looping over structure elements using pointers in C

I wrote this code to iterate over members of a structure. It works fine. Can I use similar method for structures with mixed type elements, ie some integers, some floats and ...?

#include <stdio.h>
#include <stdlib.h>

struct newData
{
    int x;
    int y;
    int z;
}  ;

int main()
{
    struct newData data1;
    data1.x = 10;
    data1.y = 20;
    data1.z = 30;

    struct newData *data2 = &data1;
    long int *addr = data2;
    for (int i=0; i<3; i++)
    {
        printf("%d \n", *(addr+i));
    }
}

In C, "it works fine" is not good enough. Because your compiler is allowed to do this:

struct newData
{
    int x;
    char padding1[523];
    int y;
    char padding2[364];
    int z;
    char padding3[251];
};

Of course, this is an extreme example. But you get the general idea; it's not guaranteed that your loop will work because it's not guaranteed that struct newData is equivalent to int[3] .

So no, it's not possible in the general case because it's not always possible in the specific case!


Now, you might be thinking: "What idiots decided this?!" Well, I can't tell you that, but I can tell you why. Computers are very different to each other, and if you want code to run fast then the compiler has to be able to choose how to compile the code. Here's an example:

Processor 8 has an instruction to get individual bytes, and put them in a register:

GETBYTE addr, reg

This works well with this struct:

struct some_bytes {
   char age;
   char data;
   char stuff;
}

struct some_bytes can happily take up 3 bytes, and the code is fast. But what about Processor 16? It doesn't have GETBYTE , but it does have GETWORD :

GETWORD even_addr, reghl

This only accepts an even-numbered address, and reads two bytes; one into the "high" part of the register and one into the "low" part of the register. In order to make the code fast, the compiler has to do this:

struct some_bytes {
   char age;
   char pad1;
   char data;
   char pad2;
   char stuff;
   char pad3;
}

This means that the code can run faster, but it also means that your loop won't work. That's OK though, because it's something called "Undefined Behaviour"; the compiler is allowed to assume that it'll never happen, and if it does happen the behaviour is undefined.

In fact, you've already run across this behaviour! Your particular compiler was doing this:

struct newData
{
    int x;
    int pad1;
    int y;
    int pad2;
    int z;
    int pad3;
};

Because your particular compiler defines long int as twice the length of int , you were able to do this:

|  x  | pad |  y  | pad |  z  | pad |

| long no.1 | long no.2 | long no.3 |
| int |     | int |     | int |     

That code is, as you can tell by my precarious diagram, precarious. It probably won't work anywhere else. What's worse, your compiler, if it was being clever, would be able to do this:

 for (int i=0; i<3; i++) { printf("%d \\n", *(addr+i)); } 

Hmm... addr is from data2 which is from data1 which is a pointer to a struct newData . The C specification says that only the pointer to the start of the struct will ever be dereferenced, so I can assume that i is always 0 in this loop!

 for (int i=0; i<3 && i == 0; i++) { printf("%d \\n", *(addr+i)); } 

That means it only runs once! Hooray!

 printf("%d \\n", *(addr + 0)); 

And all I need to compile is this:

 int main() { printf("%d \\n", 10); } 

Wow, the programmer will be so pleased that I've managed to speed this code up so much!

You won't be pleased. In fact, you'll get unexpected behaviour, and won't be able to work out why. But you would be pleased if you had written code free of Undefined Behaviour, and your compiler had done something similar. So it stays.

You're invoking undefined behavior . Just because it appears to work doesn't mean it's valid.

Pointer arithmetic is only valid when the original and resulting point both point to the same array object (or one past the end of the array object). You have multiple distinct objects (even though they're members of the same struct), so a pointer to one can't legally be used to get a pointer to the other.

This is detailed in section 6.5.6p8 of the C standard :

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n ) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Not only can you not do this with mixed types, even the code in question is ill-advised. Your code

  • assumes that there is no padding between the members
  • has strict aliasing violation ( int and long are not compatible)
  • does not have the explicit cast when assigning long int *addr = data2;
  • assumes that int and long are of the same size (not so on 64-bit Linux)
  • has array access out of bounds: even when cast to a pointer to the first member ( int *addr = (int*)data; ), doing addr[1] accesses array out of bounds.

TL;DR: In C "it works" does not mean it is correct. So if your program is wonky, don't be surprised if sometime, somewhere, someplace when you least expect it, someone steps up to you and says, smile! You've got undefined behaviour here.

The short answer is "no".

The longer answer: Your example of what "works" is not really legal, either. If, for whatever reason, you really want to be able to loop over multiple types, you can get creative with structs and unions. Such as having a struct with one member that informs of the data-type the other member holds. The other member would be a union of all the possible data-types. Something like this:

#include <stdio.h>
#include <stdlib.h>

enum TYPE {INT, DOUBLE};

union some_union {
  int x;
  double y;
};

struct multi_type {
  enum TYPE type;
  union some_union u;
};

struct some_struct {
  struct multi_type array[2];
};

int main(void) {
   struct some_struct derp;

   derp.array[0].type = INT;
   derp.array[0].u.x = 5;
   derp.array[1].type = DOUBLE;
   derp.array[1].u.y = 5.5;

   for(int i = 0; i < 2; ++i) {
      switch (derp.array[i].type) {
         case INT:
            printf("Element %d is type 'int' with value %d\n", i, derp.array[i].u.x);
            break;
         case DOUBLE:
            printf("Element %d is type 'double' with value %lf\n", i, derp.array[i].u.y);
            break;
      }
   }
   return EXIT_SUCCESS;
}

It does cause a waste of space when there is a large disparity in size of the types of elements in your union. If, for example, instead of just having int and double , you had some large complex structs that took up kilobytes of space, even your simple int elements would take up that much space.

Alternatively, if you were okay with the data not being directly in your struct, but only holding pointers to the data, you could use a similar technique that ditches unions.

#include <stdio.h>
#include <stdlib.h>

enum TYPE {INT, DOUBLE};

struct multi_type {
  enum TYPE type;
  void *data;
};

struct some_struct {
  struct multi_type array[2];
};

int main(void) {
   struct some_struct derp;
   int x;
   double y;

   derp.array[0].type = INT;
   derp.array[0].data = &x;
   *(int *)(derp.array[0].data) = 5;
   derp.array[1].type = DOUBLE;
   derp.array[1].data = &y;
   *(double *)derp.array[1].data = 5.5;

   for(int i = 0; i < 2; ++i) {
      switch (derp.array[i].type) {
         case INT:
            printf("Element %d is type 'int' with value %d\n", i, *(int *)derp.array[i].data);
            break;
         case DOUBLE:
            printf("Element %d is type 'double' with value %lf\n", i, *(double *)derp.array[i].data);
            break;
      }
   }
   return EXIT_SUCCESS;
}

Before going about doing any of that, though, I recommend thinking over your design again and think if you really need to loop over elements of different types, or if perhaps there's a better way to go about your design such as looping through each type of element separately.

All good answers above. But there is another thing that is dangerous in your code:

struct newData *data2 = &data1;
long int *addr = data2;

Here you assume that on your particular machine you can convert a pointer into your structure to a pointer to a long int. While on modern machines that probably is almost always true, there is no guarantee for that, and most compilers will at least throw a warning at you.

All the problems with dereferencing into a struct aside, you could use something like this:

struct newData *data2 = &data1;
void * addr = data2;

for(int i=0; i < 3; i++){
    printf("%d \n", *((long int *)addr+i));
}

Now that still is bad code. You use long int to compensate for the padding your compiler has put into your structure; I presume you got to that by experimentation.

You can find out about the padding, if any, the compiler applies to your structure:

#include <assert.h>
.
.
.
assert(sizeof(struct newData) / sizeof(int) == 3);

This will at least terminate your program if there is anything fishy going on, either by padding or because your structure does not match a 3 int thing. Still bad code.

You could expand the examination of the possible padding in the structure by making a more step-by-step examination of sizes and structure member addresses, but that really is quite horrible. The following pointer arithmetic to get to the individual members would get more and more obfuscated, like this:

(assuming you had calculated some padding value between your (identical!) struct members:

#include <assert.h>
.
.
.
//assert(sizeof(struct newData) / sizeof(int) == 3);

//Very ugly....don't really do this.
int padding = (sizeof(struct newData) / sizeof(int) / 3)  - 1;

.
.
.
struct newData *data2 = &data1;

// Use a void pointer, which can hold all other data pointers
void * addr = data2;

for(int i=0; i < 3; i++)
{
// Cast the pointer to (char*), because that is the only guaranteed
// type size - 1 byte
// Do your pointer arithmetic by using the actual size of int on your 
// machine, plus the padding

printf("%d \n", *((char *)addr + (i * (sizeof(int) + padding))));
}

But still it remains really nasty code. You might need to do some things like that if you want to read a specific binary input, maybe from an audio file, into some structure, but there are much better ways to do that.

PS: There is, AFAIK, no guarantee that the memory occupied by a struct is contiguous, regardless of padding issues. I guess that (small) structs on the stack are contiguous most of the time, but large ones on the heap might very well be splattered all over different memory locations.

So it is very dangerous to do pointer arithmetic into a struct at any time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM