简体   繁体   English

使用C中的指针循环结构元素

[英]Looping over structure elements using pointers in C

I wrote this code to iterate over members of a structure. 我编写了这段代码来迭代结构的成员。 It works fine. 它工作正常。 Can I use similar method for structures with mixed type elements, ie some integers, some floats and ...? 我可以对具有混合类型元素的结构使用类似的方法,即一些整数,一些浮点数和......?

#include <stdio.h>
#include <stdlib.h>

struct newData
{
    int x;
    int y;
    int z;
}  ;

int main()
{
    struct newData data1;
    data1.x = 10;
    data1.y = 20;
    data1.z = 30;

    struct newData *data2 = &data1;
    long int *addr = data2;
    for (int i=0; i<3; i++)
    {
        printf("%d \n", *(addr+i));
    }
}

In C, "it works fine" is not good enough. 在C中,“它工作正常”还不够好。 Because your compiler is allowed to do this: 因为允许您的编译器执行此操作:

struct newData
{
    int x;
    char padding1[523];
    int y;
    char padding2[364];
    int z;
    char padding3[251];
};

Of course, this is an extreme example. 当然,这是一个极端的例子。 But you get the general idea; 但是你得到了一般的想法; it's not guaranteed that your loop will work because it's not guaranteed that struct newData is equivalent to int[3] . 它不能保证你的循环能够正常工作,因为它不能保证struct newData等同于int[3]

So no, it's not possible in the general case because it's not always possible in the specific case! 所以不,在一般情况下这是不可能的,因为在特定情况下并不总是可能!


Now, you might be thinking: "What idiots decided this?!" 现在,你可能会想:“白痴决定了什么?!” Well, I can't tell you that, but I can tell you why. 好吧,我不能告诉你,但我可以告诉你为什么。 Computers are very different to each other, and if you want code to run fast then the compiler has to be able to choose how to compile the code. 计算机彼此非常不同,如果您希望代码快速运行,那么编译器必须能够选择如何编译代码。 Here's an example: 这是一个例子:

Processor 8 has an instruction to get individual bytes, and put them in a register: 处理器8具有获取单个字节的指令,并将它们放入寄存器:

GETBYTE addr, reg

This works well with this struct: 这适用于这个结构:

struct some_bytes {
   char age;
   char data;
   char stuff;
}

struct some_bytes can happily take up 3 bytes, and the code is fast. struct some_bytes可以愉快地占用3个字节,代码很快。 But what about Processor 16? 但是处理器16呢? It doesn't have GETBYTE , but it does have GETWORD : 它没有GETBYTE ,但确实GETWORD

GETWORD even_addr, reghl

This only accepts an even-numbered address, and reads two bytes; 这只接受偶数地址,并读取两个字节; one into the "high" part of the register and one into the "low" part of the register. 一个进入寄存器的“高”部分,一个进入寄存器的“低”部分。 In order to make the code fast, the compiler has to do this: 为了使代码快速,编译器必须这样做:

struct some_bytes {
   char age;
   char pad1;
   char data;
   char pad2;
   char stuff;
   char pad3;
}

This means that the code can run faster, but it also means that your loop won't work. 这意味着代码可以更快地运行,但这也意味着您的循环不起作用。 That's OK though, because it's something called "Undefined Behaviour"; 那没关系,因为它叫做“未定义的行为”; the compiler is allowed to assume that it'll never happen, and if it does happen the behaviour is undefined. 允许编译器假设它永远不会发生,如果它确实发生,则行为是未定义的。

In fact, you've already run across this behaviour! 事实上,你已经遇到过这种行为! Your particular compiler was doing this: 你的特定编译器是这样做的:

struct newData
{
    int x;
    int pad1;
    int y;
    int pad2;
    int z;
    int pad3;
};

Because your particular compiler defines long int as twice the length of int , you were able to do this: 因为您的特定编译器将long int定义为int长度的两倍,所以您可以这样做:

|  x  | pad |  y  | pad |  z  | pad |

| long no.1 | long no.2 | long no.3 |
| int |     | int |     | int |     

That code is, as you can tell by my precarious diagram, precarious. 正如你可以从我不稳定的图表中看到的那样,这段代码是不稳定的。 It probably won't work anywhere else. 它可能无法在其他任何地方工作。 What's worse, your compiler, if it was being clever, would be able to do this: 更糟糕的是,如果你的编译器很聪明,你的编译器就能做到这一点:

 for (int i=0; i<3; i++) { printf("%d \\n", *(addr+i)); } 

Hmm... addr is from data2 which is from data1 which is a pointer to a struct newData . 嗯... addr来自data2 ,它来自data1 ,它是指向struct newData的指针。 The C specification says that only the pointer to the start of the struct will ever be dereferenced, so I can assume that i is always 0 in this loop! C规范说只有指向结构开头的指针才会被取消引用,所以我可以假设在这个循环中i总是0

 for (int i=0; i<3 && i == 0; i++) { printf("%d \\n", *(addr+i)); } 

That means it only runs once! 这意味着它只运行一次! Hooray! 万岁!

 printf("%d \\n", *(addr + 0)); 

And all I need to compile is this: 我需要编译的是:

 int main() { printf("%d \\n", 10); } 

Wow, the programmer will be so pleased that I've managed to speed this code up so much! 哇,程序员会非常高兴我已经设法加快了这个代码的速度!

You won't be pleased. 你不会高兴的。 In fact, you'll get unexpected behaviour, and won't be able to work out why. 事实上,你会得到意想不到的行为,并且无法解决原因。 But you would be pleased if you had written code free of Undefined Behaviour, and your compiler had done something similar. 但是如果您编写的代码没有未定义的行为,并且您的编译器已经做了类似的事情,那么您很高兴。 So it stays. 所以它保持不变。

You're invoking undefined behavior . 您正在调用未定义的行为 Just because it appears to work doesn't mean it's valid. 仅仅因为它似乎工作并不意味着它是有效的。

Pointer arithmetic is only valid when the original and resulting point both point to the same array object (or one past the end of the array object). 指针算法仅在原始点和结果点都指向同一个数组对象(或者超过数组对象末尾的一个)时才有效。 You have multiple distinct objects (even though they're members of the same struct), so a pointer to one can't legally be used to get a pointer to the other. 您有多个不同的对象(即使它们是同一结构的成员),因此指向一个对象的指针不能合法地用于获取指向另一个的指针。

This is detailed in section 6.5.6p8 of the C standard : 这在C标准的 6.5.6p8节中详细说明:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. 当一个具有整数类型的表达式被添加到指针或从指针中减去时,结果具有指针操作数的类型。 If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. 如果指针操作数指向数组对象的元素,并且数组足够大,则结果指向偏离原始元素的元素,使得结果元素和原始数组元素的下标的差异等于整数表达式。 In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n ) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. 换句话说,如果表达式P指向数组对象的第i个元素,则表达式(P)+ N(等效地,N +(P))和(P)-N(其中N具有值n)指向分别为数组对象的第i + n和第i-n个元素,只要它们存在。 Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. 此外,如果表达式P指向数组对象的最后一个元素,则表达式(P)+1指向一个超过数组对象的最后一个元素,如果表达式Q指向一个超过数组对象的最后一个元素,表达式(Q)-1指向数组对象的最后一个元素。 If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; 如果指针操作数和结果都指向同一个数组对象的元素,或者指向数组对象的最后一个元素,则评估不应产生溢出; otherwise, the behavior is undefined. 否则,行为未定义。 If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated. 如果结果指向数组对象的最后一个元素之后,则不应将其用作已计算的一元*运算符的操作数。

Not only can you not do this with mixed types, even the code in question is ill-advised. 您不仅可以使用混合类型执行此操作,即使有问题的代码也是不明智的。 Your code 你的代码

  • assumes that there is no padding between the members 假设成员之间没有填充
  • has strict aliasing violation ( int and long are not compatible) 有严格的别名冲突( intlong不兼容)
  • does not have the explicit cast when assigning long int *addr = data2; 在赋值long int *addr = data2;时没有显式转换long int *addr = data2;
  • assumes that int and long are of the same size (not so on 64-bit Linux) 假设intlong的大小相同(在64位Linux上不是这样)
  • has array access out of bounds: even when cast to a pointer to the first member ( int *addr = (int*)data; ), doing addr[1] accesses array out of bounds. 有数组访问超出范围:即使被转换为指向第一个成员的指针( int *addr = (int*)data; ),执行addr[1]访问数组超出范围。

TL;DR: In C "it works" does not mean it is correct. TL; DR:在C“它的工作原理”并不意味着它是正确的。 So if your program is wonky, don't be surprised if sometime, somewhere, someplace when you least expect it, someone steps up to you and says, smile! 因此,如果您的计划不稳定,请不要感到惊讶,如果某个时间,某个地方,某个地方,当您最不期望它,有人走近您说,微笑! You've got undefined behaviour here. 你在这里有未定义的行为。

The short answer is "no". 最简洁的答案是不”。

The longer answer: Your example of what "works" is not really legal, either. 更长的答案:你的“工作”的例子也不合法。 If, for whatever reason, you really want to be able to loop over multiple types, you can get creative with structs and unions. 无论出于何种原因,如果您真的希望能够遍历多种类型,那么您可以通过结构和联合获得创意。 Such as having a struct with one member that informs of the data-type the other member holds. 例如,具有一个成员的结构通知另一个成员持有的数据类型。 The other member would be a union of all the possible data-types. 另一个成员将是所有可能的数据类型的联合。 Something like this: 像这样的东西:

#include <stdio.h>
#include <stdlib.h>

enum TYPE {INT, DOUBLE};

union some_union {
  int x;
  double y;
};

struct multi_type {
  enum TYPE type;
  union some_union u;
};

struct some_struct {
  struct multi_type array[2];
};

int main(void) {
   struct some_struct derp;

   derp.array[0].type = INT;
   derp.array[0].u.x = 5;
   derp.array[1].type = DOUBLE;
   derp.array[1].u.y = 5.5;

   for(int i = 0; i < 2; ++i) {
      switch (derp.array[i].type) {
         case INT:
            printf("Element %d is type 'int' with value %d\n", i, derp.array[i].u.x);
            break;
         case DOUBLE:
            printf("Element %d is type 'double' with value %lf\n", i, derp.array[i].u.y);
            break;
      }
   }
   return EXIT_SUCCESS;
}

It does cause a waste of space when there is a large disparity in size of the types of elements in your union. 当你的联盟中元素类型的大小存在很大差异时,它确实会浪费空间。 If, for example, instead of just having int and double , you had some large complex structs that took up kilobytes of space, even your simple int elements would take up that much space. 例如,如果不是只使用intdouble ,那么就会有一些占用千字节空间的大型复杂结构,即使是简单的int元素也会占用那么多空间。

Alternatively, if you were okay with the data not being directly in your struct, but only holding pointers to the data, you could use a similar technique that ditches unions. 或者,如果你没有直接在你的结构中的数据,但只保留指向数据的指针,你可以使用类似的技术来抛弃联合。

#include <stdio.h>
#include <stdlib.h>

enum TYPE {INT, DOUBLE};

struct multi_type {
  enum TYPE type;
  void *data;
};

struct some_struct {
  struct multi_type array[2];
};

int main(void) {
   struct some_struct derp;
   int x;
   double y;

   derp.array[0].type = INT;
   derp.array[0].data = &x;
   *(int *)(derp.array[0].data) = 5;
   derp.array[1].type = DOUBLE;
   derp.array[1].data = &y;
   *(double *)derp.array[1].data = 5.5;

   for(int i = 0; i < 2; ++i) {
      switch (derp.array[i].type) {
         case INT:
            printf("Element %d is type 'int' with value %d\n", i, *(int *)derp.array[i].data);
            break;
         case DOUBLE:
            printf("Element %d is type 'double' with value %lf\n", i, *(double *)derp.array[i].data);
            break;
      }
   }
   return EXIT_SUCCESS;
}

Before going about doing any of that, though, I recommend thinking over your design again and think if you really need to loop over elements of different types, or if perhaps there's a better way to go about your design such as looping through each type of element separately. 然而,在开始做任何这些之前,我建议再次考虑你的设计,并考虑你是否真的需要循环不同类型的元素,或者是否有更好的方法来进行你的设计,如循环每种类型的元素分开。

All good answers above. 上面所有的好答案。 But there is another thing that is dangerous in your code: 但是在您的代码中还有另一件事是危险的:

struct newData *data2 = &data1;
long int *addr = data2;

Here you assume that on your particular machine you can convert a pointer into your structure to a pointer to a long int. 在这里,您假设在您的特定计算机上,您可以将指针转换为结构,指向long int。 While on modern machines that probably is almost always true, there is no guarantee for that, and most compilers will at least throw a warning at you. 虽然在现代机器上几乎总是如此,但并不能保证这一点,大多数编译器至少会向你发出警告。

All the problems with dereferencing into a struct aside, you could use something like this: 解除引用到结构的所有问题,你可以使用这样的东西:

struct newData *data2 = &data1;
void * addr = data2;

for(int i=0; i < 3; i++){
    printf("%d \n", *((long int *)addr+i));
}

Now that still is bad code. 现在仍然是糟糕的代码。 You use long int to compensate for the padding your compiler has put into your structure; 使用long int来补偿编译器填充到结构中的填充; I presume you got to that by experimentation. 我认为你通过实验得到了这一点。

You can find out about the padding, if any, the compiler applies to your structure: 您可以找到编译器适用于您的结构的填充(如果有):

#include <assert.h>
.
.
.
assert(sizeof(struct newData) / sizeof(int) == 3);

This will at least terminate your program if there is anything fishy going on, either by padding or because your structure does not match a 3 int thing. 如果有任何可疑的事情要么通过填充或者因为你的结构与3 int事物不匹配,这至少会终止你的程序。 Still bad code. 还是糟糕的代码。

You could expand the examination of the possible padding in the structure by making a more step-by-step examination of sizes and structure member addresses, but that really is quite horrible. 您可以通过对大小和结构成员地址进行更详细的逐步检查来扩展对结构中可能填充的检查,但这确实非常糟糕。 The following pointer arithmetic to get to the individual members would get more and more obfuscated, like this: 以下指针算法来获取个别成员会得到越来越多的混淆,如下所示:

(assuming you had calculated some padding value between your (identical!) struct members: (假设您已经计算了(相同!)struct成员之间的一些填充值:

#include <assert.h>
.
.
.
//assert(sizeof(struct newData) / sizeof(int) == 3);

//Very ugly....don't really do this.
int padding = (sizeof(struct newData) / sizeof(int) / 3)  - 1;

.
.
.
struct newData *data2 = &data1;

// Use a void pointer, which can hold all other data pointers
void * addr = data2;

for(int i=0; i < 3; i++)
{
// Cast the pointer to (char*), because that is the only guaranteed
// type size - 1 byte
// Do your pointer arithmetic by using the actual size of int on your 
// machine, plus the padding

printf("%d \n", *((char *)addr + (i * (sizeof(int) + padding))));
}

But still it remains really nasty code. 但它仍然是非常讨厌的代码。 You might need to do some things like that if you want to read a specific binary input, maybe from an audio file, into some structure, but there are much better ways to do that. 如果您想要读取特定的二进制输入(可能是从音频文件到某种结构),您可能需要做一些类似的事情,但有更好的方法可以做到这一点。

PS: There is, AFAIK, no guarantee that the memory occupied by a struct is contiguous, regardless of padding issues. PS:有AFAIK,不保证结构占用的内存是连续的,无论填充问题如何。 I guess that (small) structs on the stack are contiguous most of the time, but large ones on the heap might very well be splattered all over different memory locations. 我想堆栈上的(小)结构在大多数情况下都是连续的,但是堆上的大型结构很可能会在不同的内存位置上散布。

So it is very dangerous to do pointer arithmetic into a struct at any time. 因此,在任何时候将指针算法运行到结构中是非常危险的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM