简体   繁体   中英

Pointer to data member address

I have read (Inside C++ object model) that address of pointer to data member in C++ is the offset of data member plus 1?
I am trying this on VC++ 2005 but i am not getting exact offset values.
For example:

Class X{  
  public:  
    int a;  
    int b;  
    int c;
}

void x(){  
  printf("Offsets of a=%d, b=%d, c=%d",&X::a,&X::b,&X::c);
}  

Should print Offsets of a=1, b=5, c=9. But in VC++ 2005 it is coming out to be a=0,b=4,c=8.
I am not able to understand this behavior.
Excerpt from book:

"That expectation, however, is off by one—a somewhat traditional error for both C and C++ programmers.

The physical offset of the three coordinate members within the class layout are, respectively, either 0, 4, and 8 if the vptr is placed at the end or 4, 8, and 12 if the vptr is placed at the start of the class. The value returned from taking the member's address, however, is always bumped up by 1. Thus the actual values are 1, 5, and 9, and so on. The problem is distinguishing between a pointer to no data member and a pointer to the first data member. Consider for example:

 float Point3d::*p1 = 0; float Point3d::*p2 = &Point3d::x; // oops: how to distinguish? if ( p1 == p2 ) { cout << " p1 & p2 contain the same value — "; cout << " they must address the same member!" << endl; } 

To distinguish between p1 and p2, each actual member offset value is bumped up by 1. Hence, both the compiler (and the user) must remember to subtract 1 before actually using the value to address a member."

The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.

Think in terms of your structure being at memory location 100:

100: class X { int a;
104:           int b;
108:           int c;

As you can see, the address of a is the same as the address of the entire structure, so its offset (what you have to add to the structure address to get the item address) is 0.

Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.


And, if that book you're taking the excerpt from is really Inside the C++ Object Model , it's getting a little long in the tooth.

The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care ) dates it quite a bit. In fact, the introduction even states "Explains the basic implementation of the object-oriented features ..." (my italics).

And the fact that nobody can find anything in the ISO standard saying this behaviour is required, along the fact that neither MSVC not gcc act that way, leads me to believe that, even if this was true of one particular implementation far in the past, it's not true (or required to be true) of all.

The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.

Firstly, the internal representation of values of a pointer to a data member type is an implementation detail. It can be done in many different ways. You came across a description of one possible implementation, where the pointer contains the offset of the member plus 1 . It is rather obvious where that "plus 1" come from: that specific implementation wants to reserve the physical zero value ( 0x0 ) for null pointer , so the offset of the first data member (which could easily be 0) has to be transformed to something else to make it different from a null pointer. Adding 1 to all such pointers solves the problem.

However, it should be noted that this is a rather cumbersome approach (ie the compiler always has to subtract 1 from the physical value before performing access). That implementation was apparently trying very hard to make sure that all null-pointers are represented by a physical zero-bit pattern. To tell the truth, I haven't encountered implementations that follow this approach in practice these days.

Today, most popular implementations (like GCC or MSVC++) use just the plain offset (not adding anything to it) as the internal representation of the pointer to a data member. The physical zero will, of course, no longer work for representing null pointers, so they use some other physical value to represent null pointers, like 0xFFFF... (this is what GCC and MSVC++ use).

Secondly, I don't understand what you were trying to say with your p1 and p2 example. You are absolutely wrong to assume that the pointers will contain the same value. They won't.

If we follow the approach described in your post ("offset + 1"), then p1 will receive the physical value of null pointer (apparently a physical 0x0 ), while the p2 whill receive physical value of 0x1 (assuming x has offset 0). 0x0 and 0x1 are two different values.

If we follow the approach used by modern GCC and MSVC++ compilers, then p1 will receive the physical value of 0xFFFF.... (null pointer), while p2 will be assigned a physical 0x0 . 0xFFFF... and 0x0 are again different values.

PS I just realized that the p1 and p2 example is actually not yours, but a quote from a book. Well, the book, once again, is describing the same problem I mentioned above - the conflict of 0 offset with 0x0 representation for null pointer, and offers one possible viable approach to solving that conflict. But, once again, there are alternative ways to do it, and many compilers today use completely different approaches.

The behavior you're getting looks quite reasonable to me. What sounds wrong is what you read.

To complement AndreyT's answer: Try running this code on your compiler.

void test()
{  
    using namespace std;

    int X::* pm = NULL;
    cout << "NULL pointer to member: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;

    pm = &X::a;
    cout << "pointer to member a: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;

    pm = &X::b;
    cout << "pointer to member b: "
        << " value = " << pm 
        << ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
}

On Visual Studio 2008 I get:

NULL pointer to member:  value = 0, raw byte value = 0xffffffff
pointer to member a:  value = 1, raw byte value = 0x0
pointer to member b:  value = 1, raw byte value = 0x4

So indeed, this particular compiler is using a special bit pattern to represent a NULL pointer and thus leaving an 0x0 bit pattern as representing a pointer to the first member of an object.

This also means that wherever the compiler generates code to translate such a pointer to an integer or a boolean, it must be taking care to look for that special bit pattern. Thus something like if(pm) or the conversion performed by the << stream operator is actually written by the compiler as a test against the 0xffffffff bit pattern (instead of how we typically like to think of pointer tests being a raw test against address 0x0).

I have read that address of pointer to data member in C++ is the offset of data member plus 1?

I have never heard that, and your own empirical evidence shows it's not the case. I think you misunderstood an odd property of structs & class in C++. If they are completely empty, they nevertheless have a size of 1 (so that each element of an array of them has a unique address)

$9.2/12 is interesting

Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).

This explains that such behavior is implementation defined. However the fact that 'a', 'b' and 'c' are at increasing addresses is in accordance with the Standard.

You may want to check out How are objects stored in memory in C++? which talks about this issue in much more detail.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM