简体   繁体   English

为什么 offsetof() 的这种实现有效?

[英]Why does this implementation of offsetof() work?

In ANSI C, offsetof is defined as below.在 ANSI C 中,offsetof 定义如下。

#define offsetof(st, m) \
    ((size_t) ( (char *)&((st *)(0))->m - (char *)0 ))

Why won't this throw a segmentation fault since we are dereferencing a NULL pointer?为什么这不会引发分段错误,因为我们正在取消引用 NULL 指针? Or is this some sort of compiler hack where it sees that only address of the offset is taken out, so it statically calculates the address without actually dereferencing it?或者这是某种编译器黑客,它看到只有偏移量的地址被取出,所以它静态计算地址而不实际取消引用它? Also is this code portable?这段代码也可移植吗?

At no point in the above code is anything dereferenced.上面代码中的任何一点都没有被取消引用。 A dereference occurs when the * or -> is used on an address value to find referenced value.当在地址值上使用*->以查找引用值时,会发生取消引用。 The only use of * above is in a type declaration for the purpose of casting.上面*的唯一用途是在类型声明中用于强制转换。

The -> operator is used above but it's not used to access the value. ->运算符在上面使用,但不用于访问值。 Instead it's used to grab the address of the value.相反,它用于获取值的地址。 Here is a non-macro code sample that should make it a bit clearer这是一个非宏代码示例,应该让它更清楚一点

SomeType *pSomeType = GetTheValue();
int* pMember = &(pSomeType->SomeIntMember);

The second line does not actually cause a dereference (implementation dependent).第二行实际上不会导致取消引用(取决于实现)。 It simply returns the address of SomeIntMember within the pSomeType value.它只是在pSomeType值中返回SomeIntMember的地址。

What you see is a lot of casting between arbitrary types and char pointers.您看到的是任意类型和字符指针之间的大量转换。 The reason for char is that it's one of the only type (perhaps the only) type in the C89 standard which has an explicit size. char 的原因是它是 C89 标准中唯一(可能是唯一)具有明确大小的类型之一。 The size is 1. By ensuring the size is one, the above code can do the evil magic of calculating the true offset of the value.大小为 1。通过确保大小为 1,上面的代码可以实现计算值的真实偏移量的邪恶魔法。

Although that is a typical implementation of offsetof , it is not mandated by the standard, which just says:虽然这是offsetof的典型实现,但标准并没有强制要求,它只是说:

The following types and macros are defined in the standard header <stddef.h> [...]以下类型和宏定义在标准头文件<stddef.h> [...]

offsetof( type , member-designator ) offsetof( type , member-designator )

which expands to an integer constant expression that has type size_t , the value of which is the offset in bytes, to the structure member (designated by member-designator ), from the beginning of its structure (designated by type ).它扩展为具有类型size_t的整数常量表达式,其值是以字节为单位的偏移量,从其结构的开头(由type member-designator )到结构成员(由member-designator指定)。 The type and member designator shall be such that given类型和成员代号应该是这样的

static type t; static type t;

then the expression &(t. member-designator ) evaluates to an address constant.然后表达式&(t. member-designator )计算为地址常量。 (If the specified member is a bit-field, the behavior is undefined.) (如果指定的成员是位域,则行为未定义。)

Read PJ Plauger's "The Standard C Library" for a discussion of it and the other items in <stddef.h> which are all border-line features that could (should?) be in the language proper, and which might require special compiler support.阅读 PJ Plauger 的“标准 C 库”以讨论它和<stddef.h>的其他项目,这些都是可以(应该?)在语言中的边界线特性,并且可能需要特殊的编译器支持.

It's of historic interest only, but I used an early ANSI C compiler on 386/IX (see, I told you of historic interest, circa 1990) that crashed on that version of offsetof but worked when I revised it to:它仅具有历史意义,但我在 386/IX 上使用了早期的 ANSI C 编译器(参见,我告诉过您具有历史意义,大约在 1990 年)该编译器在该版本的offsetof上崩溃,但在我将其修改为:

#define offsetof(st, m) ((size_t)((char *)&((st *)(1024))->m - (char *)1024))

That was a compiler bug of sorts, not least because the header was distributed with the compiler and didn't work.那是某种编译器错误,尤其是因为头文件与编译器一起分发并且不起作用。

In ANSI C, offsetof is NOT defined like that.在 ANSI C 中, offsetof不是这样定义的。 One of the reasons it's not defined like that is that some environments will indeed throw null pointer exceptions, or crash in other ways.它没有这样定义的原因之一是某些环境确实会抛出空指针异常,或者以其他方式崩溃。 Hence, ANSI C leaves the implementation of offsetof( ) open to compiler builders.因此,ANSI C 将offsetof( )的实现留给编译器构建者开放。

The code shown above is typical for compilers/environments that do not actively check for NULL pointers, but fail only when bytes are read from a NULL pointer.上面显示的代码是典型的编译器/环境,它们不主动检查 NULL 指针,但仅在从 NULL 指针读取字节时才会失败。

To answer the last part of the question, the code is not portable.为了回答问题的最后一部分,代码不可移植。

The result of subtracting two pointers is defined and portable only if the two pointers point to objects in the same array or point to one past the last object of the array (7.6.2 Additive Operators, H&S Fifth Edition)只有当两个指针指向同一数组中的对象或指向数组最后一个对象之后的一个对象时,两个指针相减的结果才被定义和移植(7.6.2 Additive Operators, H&S Fifth Edition)

Listing 1: A representative set of offsetof() macro definitions清单 1:一组代表性的offsetof()宏定义

// Keil 8051 compiler
#define offsetof(s,m) (size_t)&(((s *)0)->m)

// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)

// Diab Coldfire compiler
#define offsetof(s,memb) ((size_t)((char *)&((s *)0)->memb-(char *)0))

typedef struct 
{
    int     i;
    float   f;
    char    c;
} SFOO;

int main(void)
{
  printf("Offset of 'f' is %zu\n", offsetof(SFOO, f));
}

The various operators within the macro are evaluated in an order such that the following steps are performed:宏中的各种运算符按顺序计算,以便执行以下步骤:

  1. ((s *)0) takes the integer zero and casts it as a pointer to s . ((s *)0)取整数零并将其转换为指向s的指针。
  2. ((s *)0)->m dereferences that pointer to point to structure member m . ((s *)0)->m取消引用指向结构成员m指针。
  3. &(((s *)0)->m) computes the address of m . &(((s *)0)->m)计算的地址m
  4. (size_t)&(((s *)0)->m) casts the result to an appropriate data type. (size_t)&(((s *)0)->m)将结果转换为适当的数据类型。

By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure.根据定义,结构本身位于地址 0。因此指向的字段地址(上面的第 3 步)必须是从结构开始的偏移量(以字节为单位)。

It doesn't segfault because you're not dereferencing it.它不会出现段错误,因为您没有取消引用它。 The pointer address is being used as a number that's subtracted from another number, not used to address memory operations.指针地址被用作从另一个数字中减去的数字,而不是用于寻址内存操作。

It calculates the offset of the member m relative to the start address of the representation of an object of type st .它计算成员m相对于st类型对象表示的起始地址的偏移量。

((st *)(0)) refers to a NULL pointer of type st * . ((st *)(0))指的是st *类型的NULL指针。 &((st *)(0))->m refers to the address of member m in this object. &((st *)(0))->m指的是这个对象中成员m的地址。 Since the start address of this object is 0 (NULL) , the address of member m is exactly the offset.由于此对象的起始地址为0 (NULL) ,因此成员 m 的地址正是偏移量。

char * conversion and the difference calculates the offset in bytes. char *转换和差值计算以字节为单位的偏移量。 According to pointer operations, when you make a difference between two pointers of type T * , the result is the number of objects of type T represented between the two addresses contained by the operands.根据指针操作,当您在两个T *类型的指针之间进行区分时,结果是操作数包含的两个地址之间表示的T类型对象的数量。

Quoting the C standard for the offsetof macro:引用offsetof宏的 C 标准:

C standard, section 6.6, paragraph 9 C 标准,第 6.6 节,第 9 段

An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator;地址常量是空指针、指向指定静态存储持续时间对象的左值指针或指向函数指示符的指针; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type.它应使用一元&运算符或转换为指针类型的整数常量显式创建,或通过使用数组或函数类型的表达式隐式创建。 The array-subscript [] and member-access .数组下标[]和成员访问. and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be accessed by use of these operators.->运算符、地址&和间接*一元运算符以及指针强制转换可用于创建地址常量,但不能使用这些运算符访问对象的值。

The macro is defined as宏定义为

#define offsetof(type, member)  ((size_t)&((type *)0)->member)

and the expression comprises the creation of an address constant.并且该表达式包括地址常量的创建。

Although genuinely speaking, the result is not an address constant because it does not point to an object of static storage duration.虽然说实话,结果不是地址常量,因为它不指向静态存储持续时间的对象。 But this is still agreed upon that the value of an object shall not be accessed, so the integer constant cast to pointer type will not be dereferenced.但这仍然约定不得访问对象的值,因此不会取消引用转换为指针类型的整数常量。

Also, consider this quote from the C standard:另外,请考虑 C 标准中的引用:

C standard, section 7.19, paragraph 3 C 标准,第 7.19 节,第 3 段

The type and member designator shall be such that given类型和成员代号应该是这样的

static type t;

then the expression &(t.member-designator) evaluates to an address constant.然后表达式&(t.member-designator)计算为地址常量。 (If the specified member is a bit-field, the behavior is undefined.) (如果指定的成员是位域,则行为未定义。)

A struct in C is a composite data type (or record) declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address. C 中的 struct 是一种复合数据类型(或记录)声明,它在内存块中以一个名称定义了一个物理分组的变量列表,允许通过单个指针或通过返回的结构声明名称访问不同的变量同一个地址。

From the compiler perspective, the struct declared name is an address and the member designator is an offset from that address.从编译器的角度来看,结构体声明的名称是一个地址,成员指示符是该地址的偏移量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM