简体   繁体   English

C++ 中所有指针的大小是否相同?

[英]Do all pointers have the same size in C++?

Recently, I came across the following statement :最近,我遇到了以下声明

It's quite common for all pointers to have the same size, but it's technically possible for pointer types to have different sizes .所有指针具有相同的大小是很常见的,但指针类型具有不同的大小在技术上是可能的

But then I came across this which states that:但后来我遇到了这个,它指出:

While pointers are all the same size , as they just store a memory address, we have to know what kind of thing they are pointing TO.虽然指针的大小都是一样的,因为它们只是存储一个内存地址,我们必须知道它们指向的是什么类型的东西。

Now, I am not sure which of the above statements is correct.现在,我不确定以上哪些陈述是正确的。 The second quoted statement looks like it's from the C++ notes of Computer Science, Florida State University.第二个引用的声明看起来像是来自佛罗里达州立大学计算机科学的 C++ 笔记。


Here's why, in my opinion all pointers should have the same size:这就是为什么,在我看来,所有指针都应该具有相同的大小:

1) Say we have: 1)假设我们有:

int i = 0;
void* ptr = &i; 

Now, suppose the C++ standard allows pointers to have different sizes.现在,假设 C++ 标准允许指针具有不同的大小。 Further suppose that on some arbitrary machine/compiler (since it is allowed by the standard), a void* has size 2 bytes while a int* has size 4 bytes.进一步假设在某些任意机器/编译器上(因为它是标准允许的), void*的大小为 2 个字节,而int*的大小为 4 个字节。

Now, I think there is a problem here which is that the right hand side has an int* which has size 4 bytes while on the left hand side we have a void* which has size 2 bytes.现在,我认为这里存在一个问题,即右侧有一个大小为 4 字节的int* ,而在左侧我们有一个大小为 2 字节的void* Thus, when the implicit conversion happens from int* to void* there will be some loss of information .因此,当从int*void*发生隐式转换时,会丢失一些信息

2) All pointers hold addresses. 2)所有指针都保存地址。 Since for a given machine all addresses have the same size, it is very natural (logical) that all pointers should also have the same size.由于对于给定的机器,所有地址都具有相同的大小,因此所有指针也应该具有相同的大小是非常自然的(合乎逻辑的)。

Therefore, I think that the second quote is true.因此,我认为第二句话是正确的。


My first question is what does the C++ standard say about this?我的第一个问题是 C++ 标准对此有何看法?

My second question is, if the C++ standard does allow pointers to be of different size, then is there a reason for it?我的第二个问题是,如果 C++ 标准确实允许指针具有不同的大小,那有​​什么理由吗? I mean allowing pointers to be of different size seems a bit unnatural to me (considering the 2 points I explained above).我的意思是允许指针大小不同对我来说似乎有点不自然(考虑到我上面解释的两点)。 So, I am pretty sure that the standard committee must have already given this (that pointers can have different sizes) thought and already have a reason for allowing pointers to have different sizes.所以,我很确定标准委员会一定已经考虑到了这一点(指针可以有不同的大小),并且已经有理由允许指针有不同的大小。 Note that I am asking this (2nd question) only if the standard does allow pointers to have different size.请注意,仅当标准确实允许指针具有不同的大小时,我才问这个(第二个问题)。

While it might be tempting to conclude that all pointers are the same size because "pointers are just addresses, and addresses are just numbers of the same size", it is not guaranteed by the standard and thus cannot be relied upon.虽然可能很容易得出结论,所有指针的大小都相同,因为“指针只是地址,地址只是相同大小的数字”,但标准并不能保证这一点,因此不能依赖。

The C++ standard explicitly guarantees that: C++ 标准明确保证:

  • void* has the same size as char* ( [basic.compound]/5 ) void*char* ( [basic.compound]/5 ) 大小相同
  • T const* , T volatile* , and T const volatile* have the same size as T* . T const*T volatile*T const volatile*的大小与T*相同。 This is because cv-qualified versions of the same type are layout-compatible , and pointers to layout-compatible types have the same value representation ( [basic.compound]/3 ).这是因为相同类型的 cv 限定版本是布局兼容的,并且指向布局兼容类型的指针具有相同的值表示( [basic.compound]/3 )。
  • Similarly, any two enum types with the same underlying type are layout-compatible ( [dcl.enum]/9 ), therefore pointers to such enum types have the same size.类似地,具有相同底层类型的任何两个枚举类型都是布局兼容的( [dcl.enum]/9 ),因此指向此类枚举类型的指针具有相同的大小。

It is not guaranteed by the standard, but it is basically always true in practice, that pointers to all class types have the same size.标准不能保证,但在实践中基本上总是正确的,指向所有类类型的指针具有相同的大小。 The reason for this is as follows: a pointer to an incomplete class type is a complete type, meaning that you are entitled to ask the compiler sizeof(T*) even when T is an incomplete class type, and if you then ask the compiler sizeof(T*) again later in the translation unit after T has been defined, the result must be the same.原因如下:指向不完整类类型的指针是完整类型,这意味着即使T是不完整类类型,您也有权询问编译器sizeof(T*) ,如果您随后询问编译器sizeof(T*)稍后在定义T后的翻译单元中,结果必须相同。 Furthermore, the result must also be the same in every other translation unit where T is declared, even if it is never completed in another translation unit.此外,在声明T的每个其他翻译单元中,结果也必须相同,即使它从未在另一个翻译单元中完成。 Therefore, the compiler must be able to determine the size of T* without knowing what's inside T .因此,编译器必须能够在不知道 T 内部内容的情况下确定T T*的大小。 Technically, compilers are still allowed to play some tricks, such as saying that if the class name starts with a particular prefix, then the compiler will assume that you want instances of that class to be subject to garbage collection, and make pointers to it longer than other pointers.从技术上讲,编译器仍然可以玩一些花样,比如说如果类名以特定前缀开头,那么编译器会假设您希望该类的实例接受垃圾回收,并延长指向它的指针比其他指针。 In practice, compilers do not seem to use this freedom, and you can assume that pointers to different class types have the same size.实际上,编译器似乎没有使用这种自由,您可以假设指向不同类类型的指针具有相同的大小。 If you rely on this assumption, you can put a static_assert in your program and say that it doesn't support the pathological platforms where the assumption is violated.如果你依赖这个假设,你可以在你的程序中放一个static_assert并说它不支持违反假设的病态平台。

Also, in practice, it will generally be the case that此外,在实践中,通常情况下,

  • any two function pointer types have the same size,任何两种函数指针类型具有相同的大小,
  • any two pointer to data member types will have the same size, and任何两个指向数据成员类型的指针都将具有相同的大小,并且
  • any two pointer to function member types will have the same size.任何两个指向函数成员类型的指针都将具有相同的大小。

The reason for this is that you can always reinterpret_cast from one function pointer type to another and then back to the original type without losing information, and so on for the other two categories listed above ( expr.reinterpret.cast ).这样做的原因是,您始终可以将函数指针类型从一种函数指针类型reinterpret_cast为另一种类型,然后再返回到原始类型而不会丢失信息,对于上面列出的其他两个类别 ( expr.reinterpret.cast ),依此类推。 While a compiler is allowed to make them different sizes by giving them different amounts of padding, there is no practical reason to do this.虽然允许编译器通过给它们不同的填充量来使它们具有不同的大小,但没有实际理由这样做。

(However, MSVC has a mode where pointers to members do not necessarily have the same size. It is not due to different amounts of padding, but simply violates the standard. So if you rely on this in your code, you should probably put a static_assert .) (但是, MSVC 有一种模式,指向成员的指针不一定具有相同的大小。这不是由于填充量不同,而只是违反了标准。因此,如果您在代码中依赖它,您可能应该放一个static_assert 。)

If you have a segmented architecture with near and far pointers, you should not expect them to have the same size.如果您有一个带有近指针和远指针的分段架构,则不应期望它们具有相同的大小。 This is an exception to the rules above about certain pairs of pointer types generally having the same size.这是上述关于某些通常具有相同大小的指针类型对的规则的例外。

Member function pointers can differ:成员函数指针可以不同:

void* ptr;

size_t (std::string::*mptr)();

std::cout << sizeof(ptr) << '\n';
std::cout << sizeof(mptr) << std::endl;

This printed这印

8
16

on my system.在我的系统上。 Background is that member function pointers need to hold additional information eg about virtuality etc.背景是成员函数指针需要保存额外的信息,例如关于虚拟性等。

Historically there were systems on which existed 'near' and 'far' pointers which differed in size as well (16 vs. 32 bit) – as far as I am aware of they don't play any role nowadays any more, though.从历史上看,有些系统上存在大小不同的“近”和“远”指针(16 位与 32 位)——据我所知,它们现在不再发挥任何作用。

A few rules:几个规则:

  1. The sizes of plain-old-data pointers can differ, eg double* can be (and often is) larger than int* .普通旧数据指针的大小可以不同,例如double*可以(并且通常)大于int* (Think of architectures with off-board floating point units.) (想想带有板外浮点单元的架构。)

  2. void* must be sufficiently large to hold any object pointer type. void*必须足够大以容纳任何对象指针类型。

  3. The size of any non-plain-old-data pointer is the same as any other.任何非普通旧数据指针的大小都与其他指针相同。 In other words sizeof(myclass*) == sizeof(yourclass*) .换句话说sizeof(myclass*) == sizeof(yourclass*)

  4. sizeof(const T*) is the same as sizeof(T*) for any T ; sizeof(const T*)与任何Tsizeof(T*)相同; plain-old-data or otherwise普通旧数据或其他

  5. Member function pointers are not pointers.成员函数指针不是指针。 Pointers to non-member functions, including static member functions, are pointers.指向非成员函数的指针,包括static成员函数,都是指针。

suppose the standard C++ allows pointers to have different sizes假设标准 C++ 允许指针具有不同的大小

The size, structure, and format of a pointer is determined by the architecture of the underlying CPU.指针的大小、结构和格式由底层 CPU 的架构决定。 Language standards don't have the ability to make many demands about these things because it's not something the compiler implementer can control.语言标准没有能力对这些事情提出很多要求,因为它不是编译器实现者可以控制的。 Instead, language specs focus on how pointers will behave when used in code.相反,语言规范专注于指针在代码中使用时的行为方式 The C99 Rationale document (different language, but the reasoning is still valid) makes the following comments in section 6.3.2.3: C99 基本原理文档(不同的语言,但推理仍然有效)在第 6.3.2.3 节中做出以下评论:

C has now been implemented on a wide range of architectures. C 现在已经在广泛的体系结构上实现。 While some of these architectures feature uniform pointers which are the size of some integer type, maximally portable code cannot assume any necessary correspondence between different pointer types and the integer types.虽然其中一些架构具有统一指针,其大小与某些整数类型相同,但最大可移植代码不能假定不同指针类型和整数类型之间有任何必要的对应关系。 On some implementations, pointers can even be wider than any integer type.在某些实现中,指针甚至可以比任何整数类型更宽。

... ...

Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.关于指向函数的指针什么也没说,这可能与对象指针和/或整数不相称。

An easy example of this is a pure Harvard architecture computer.一个简单的例子是纯哈佛架构计算机。 Executable instructions and data are stored in separate memory areas, each with separate signal pathways.可执行指令和数据存储在单独的内存区域中,每个区域都有单独的信号通路。 A Harvard architecture system can use 32-bit pointers for data but only 16-bit pointers to a much smaller instruction memory pool.哈佛架构系统可以使用 32 位指针来存储数据,但只能使用 16 位指针指向更小的指令内存池。

The compiler implementer has to ensure that they generate code that both functions correctly on the target platform and behaves according to the rules in the language spec.编译器实现者必须确保他们生成的代码既能在目标平台正确运行,又能按照语言规范中的规则运行。 Sometimes that means that all pointers are the same size, but not always.有时这意味着所有指针的大小都相同,但并非总是如此。

The second reason for having all the pointer to be of the same size is that all pointer hold address.使所有指针具有相同大小的第二个原因是所有指针都保存地址。 And since for a given machine all addresses have the same size因为对于给定的机器,所有地址都具有相同的大小

Neither of those statements are necessarily true.这些陈述都不一定是正确的。 They're true on most common architectures in use today, but they don't have to be.它们适用于当今使用的大多数常见架构,但并非必须如此。

As an example, so-called "segmented" memory architectures can have multiple ways to format an assembly operation.例如,所谓的“分段”内存架构可以有多种方式来格式化汇编操作。 References within the current memory "segment" can use a short "offset" value, whereas references to memory outside the current segment require two values: a segment ID plus an offset.当前内存“段”内的引用可以使用一个短的“偏移”值,而对当前段之外的内存的引用需要两个值:一个段 ID 加上一个偏移。 In DOS on x86 these were called "near" and "far" pointers, respectively, and were 16 and 32 bits wide.在 x86 上的 DOS 中,这些分别称为“近”和“远”指针,分别为 16 位和 32 位宽。

I've also seen some specialized chips (like DSPs) that used two bytes of memory to store a 12-bit pointer.我还看到了一些使用两个字节内存来存储 12 位指针的专用芯片(如 DSP)。 The remaining four bits were flags that controlled the way memory was accessed (cached vs. uncached, etc.) The pointer contained the memory address, but it was more than just that.剩下的四位是控制内存访问方式的标志(缓存与非缓存等)。指针包含内存地址,但不仅如此。

What a language spec does with all of this is to define a set of rules defining how you can and cannot use pointers in your code, as well as what behavior should be observable for each pointer-related operation.语言规范对所有这些所做的是定义一组规则,定义如何在代码中使用和不能使用指针,以及每个与指针相关的操作应该观察到哪些行为。 As long as you stick to those rules, your program should behave according to the spec's description.只要您遵守这些规则,您的程序就应该按照规范的描述运行。 It's the compiler writer's job to figure out how to bridge the gap between the two and generate the correct code without you having to know anything about the CPU architecture's quirks.编译器编写者的工作是弄清楚如何弥合两者之间的差距并生成正确的代码,而无需您对 CPU 架构的怪癖有任何了解。 Going outside the spec and invoking unspecified behavior will make those implementation details become relevant and you're no longer guaranteed as to what will happen.超出规范并调用未指定的行为将使这些实现细节变得相关,并且您不再保证会发生什么。 I recommend enabling the compiler warning for conversions that result in a loss of data, and then treating that warning as a hard error.我建议为导致数据丢失的转换启用编译器警告,然后将该警告视为硬错误。

Your reasoning in the first case is half-correct.您在第一种情况下的推理是正确的。 void* must be able to hold any int* value. void*必须能够保存任何int*值。 But the reverse is not true.但反过来是不正确的。 Hence, it's quite possible for void* to be bigger than int* .因此, void*很有可能大于int*

The statement als gets more complex if you include other pointer types, such as pointers to functions and pointers to methods.如果您包含其他指针类型,例如指向函数的指针和指向方法的指针,则该语句也会变得更加复杂。

One of the reasons considered by the C++ Standards committee are DSP chips, where the hardware word size is 16 bits, but char is implemented as a half-word. C++ 标准委员会考虑的原因之一是 DSP 芯片,其中硬件字长为 16 位,但char实现为半字。 This means char* and void* need one extra bit compared to short* and int* .这意味着char*void*short*int*相比需要一个额外的位。

As an embedded programmer, I wonder whether even these C languages have taken us too far from the machine!作为一名嵌入式程序员,我想知道这些 C 语言是否让我们离机器太远了! :) :)

The father, "C", was used to design systems (low-level).父亲“C”用于设计系统(低级)。 Part of the reason different pointer variables need not be the same size is that they can refer to physically different system memories.不同指针变量不必具有相同大小的部分原因是它们可以引用物理上不同的系统内存。 That is, different data at different memory addresses can actually be located on separate electronic integrated circuits (IC)!也就是说,不同内存地址的不同数据实际上可以位于不同的电子集成电路(IC)上! For example, constant data might be located on one non-volatile IC, volatile variables on another IC, etc. A memory IC might be designed to be accessed 1 byte at a time, or 4 bytes at a time, etc. (what "pointer++" does).例如,常量数据可能位于一个非易失性 IC 上,易失性变量位于另一个 IC 上,等等。存储器 IC 可能被设计为一次访问 1 个字节,或一次访问 4 个字节,等等。(什么“指针++”确实)。

What if the particular memory bus/address space is only a byte wide?如果特定的内存总线/地址空间只有一个字节宽怎么办? (I've worked with those before.) Then pointer==0xFFFFFFFFFFFFFFFF would be wasteful and perhaps unsafe. (我以前曾与这些人合作过。)然后 pointer==0xFFFFFFFFFFFFFFFF 将是浪费的,也许是不安全的。

I've seen actual code for a DSP that addressed 16 bit units.我已经看到了针对 16 位单元的 DSP 的实际代码。 So if you took a pointer to int, interpreted the bits as an integer, and increased that by one, the pointer would point to the next 16 bit int.因此,如果您获取一个指向 int 的指针,将这些位解释为整数,并将其加一,则指针将指向下一个 16 位 int。

On this system, char was also 16 bits.在这个系统上,char 也是 16 位的。 If char had been 8 bits, then a char* would have been an int pointer with at least one additional bit.如果 char 是 8 位,那么 char* 将是一个具有至少一个附加位的 int 指针。

In addition to the requirements of the C++ standard, any implementation that supports the UNIX dlsym() library call must be able to convert a function pointer to a void* .除了 C++ 标准的要求之外,任何支持UNIX dlsym()库调用的实现都必须能够将函数指针转换为void* All function pointers must also be the same size.所有函数指针的大小也必须相同。

There have been architectures in the real world where different kinds of pointers have different sizes.现实世界中存在不同类型的指针具有不同大小的架构。 One formerly very mainstream example was MS-DOS, where the Compact and Medium memory models could make code pointers larger than data pointers or vice versa.一个以前非常主流的例子是 MS-DOS,其中紧凑型和中型内存模型可以使代码指针大于数据指针,反之亦然。 In segmented memory, it was also possible to have object pointers that were different sizes (such as near and far pointers).在分段内存中,也可能有不同大小的对象指针(例如near指针和far指针)。

Practically, you'll find that all pointers within one system are same size, for nearly all modern systems;实际上,您会发现一个系统中的所有指针大小都相同,几乎适用于所有现代系统。 with 'modern' starting at 2000.从 2000 年开始使用“现代”。
The permission to be different size comes from older systems using chips like 8086, 80386, etc, where there were 'near' and 'far' pointers, of obviously different sizes.不同大小的许可来自使用 8086、80386 等芯片的旧系统,其中有明显不同大小的“近”和“远”指针。 It was the compiler's (and sometimes the developer's) job to sort out - and remember!整理是编译器(有时是开发人员)的工作——记住! - what goes in a near pointer and what goes in a far pointer. - 近指针中的内容和远指针中的内容。

C++ needs to stay compatible with those times and environments. C++ 需要与那些时代和环境保持兼容。

In modern C++, there are smart pointers in the standard library, std::unique_ptr , and std::shared_ptr .在现代 C++ 中,标准库std::unique_ptrstd::shared_ptr中有智能指针。 The unique pointer can be the same size of regular pointers when they do not have a deleter function stored with them.当它们没有存储删除函数时,唯一指针的大小可以与常规指针相同。 A shared pointer may be larger, since it could still store the pointer, but also a pointer to a control block maintaining the reference counts and deleter for the object.共享指针可能更大,因为它仍然可以存储指针,而且还可以存储指向控制块的指针,该控制块维护对象的引用计数和删除器。 This control block could potentially be stored with the allocated object (using std::make_shared ), so it may make the reference counted object slightly bigger.此控制块可能与分配的对象一起存储(使用std::make_shared ),因此它可能会使引用计数的对象稍大。

See this interesting question: Why is the size of make_shared two pointers?看到这个有趣的问题: 为什么make_shared的大小是两个指针?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM