简体   繁体   English

在C ++中分配和使用无类型内存块的正确方法是什么?

[英]What is the correct way to allocate and use an untyped memory block in C++?

The answers I got for this question until now has two exactly the opposite kinds of answers: "it's safe" and "it's undefined behaviour". 到目前为止,我对这个问题的答案有两个完全相反的答案:“它是安全的”和“它是未定义的行为”。 I decided to rewrite the question in whole to get some better clarifying answers, for me and for anyone who might arrive here via Google. 为了我和任何可能通过Google到达这里的人,我决定整体重写这个问题,以获得更好的澄清答案。

Also, I removed the C tag and now this question is C++ specific 此外,我删除了C标签,现在这个问题是C ++特定的

I am making an 8-byte-aligned memory heap that will be used in my virtual machine. 我正在制作一个8字节对齐的内存堆,将在我的虚拟机中使用。 The most obvious approach that I can think of is by allocating an array of std::uint64_t . 我能想到的最明显的方法是分配一个std::uint64_t数组。

std::unique_ptr<std::uint64_t[]> block(new std::uint64_t[100]);

Let's assume sizeof(float) == 4 and sizeof(double) == 8 . 我们假设sizeof(float) == 4sizeof(double) == 8 I want to store a float and a double in block and print the value. 我想在block存储一个float和一个double并打印该值。

float* pf = reinterpret_cast<float*>(&block[0]);
double* pd = reinterpret_cast<double*>(&block[1]);
*pf = 1.1;
*pd = 2.2;
std::cout << *pf << std::endl;
std::cout << *pd << std::endl;

I'd also like to store a C-string saying "hello". 我还想存储一个说“你好”的C字符串。

char* pc = reinterpret_cast<char*>(&block[2]);
std::strcpy(pc, "hello\n");
std::cout << pc;

Now I want to store "Hello, world!" 现在我要存储“Hello,world!” which goes over 8 bytes, but I still can use 2 consecutive cells. 超过8个字节,但我仍然可以使用2个连续的单元格。

char* pc2 = reinterpret_cast<char*>(&block[3]);
std::strcpy(pc2, "Hello, world\n");
std::cout << pc2;

For integers, I don't need a reinterpret_cast . 对于整数,我不需要reinterpret_cast

block[5] = 1;
std::cout << block[5] << std::endl;

I'm allocating block as an array of std::uint64_t for the sole purpose of memory alignment. 我将block分配为std::uint64_t的数组,仅用于内存对齐。 I also do not expect anything larger than 8 bytes by its own to be stored in there. 我也不希望它自己存储大于8个字节的内容。 The type of the block can be anything if the starting address is guaranteed to be 8-byte-aligned. 如果起始地址保证为8字节对齐,则块的类型可以是任何类型。

Some people already answered that what I'm doing is totally safe, but some others said that I'm definitely invoking undefined behaviour. 有些人已经回答说我正在做的事情是完全安全的,但有些人说我肯定会调用未定义的行为。

Am I writing correct code to do what I intend? 我是否正在编写正确的代码来执行我的意图? If not, what is the appropriate way? 如果没有,适当的方式是什么?

The global allocation functions 全局分配功能

To allocate an arbitrary (untyped) block of memory, the global allocation functions (§3.7.4/2); 要分配任意(无类型)内存块,全局分配函数(§3.7.4/ 2);

 void* operator new(std::size_t); void* operator new[](std::size_t); 

Can be used to do this (§3.7.4.1/2). 可以用来做(§3.7.4.1/ 2)。

§3.7.4.1/2 §3.7.4.1/ 2

The allocation function attempts to allocate the requested amount of storage. 分配功能尝试分配所请求的存储量。 If it is successful, it shall return the address of the start of a block of storage whose length in bytes shall be at least as large as the requested size. 如果成功,它将返回存储块的起始地址,其长度以字节为单位应至少与请求的大小一样大。 There are no constraints on the contents of the allocated storage on return from the allocation function. 从分配函数返回时,分配的存储的内容没有限制。 The order, contiguity, and initial value of storage allocated by successive calls to an allocation function are unspecified. 未指定由连续调用分配函数分配的存储的顺序,连续性和初始值。 The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function). 返回的指针应适当对齐,以便可以将其转换为具有基本对齐要求(3.11)的任何完整对象类型的指针,然后用于访问分配的存储中的对象或数组(直到存储被显式解除分配为止)调用相应的释放函数)。

And 3.11 has this to say about a fundamental alignment requirement ; 3.11就基本对齐要求说了这个话题 ;

§3.11/2 §3.11/ 2

A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t) . 基本对齐由小于或等于所有上下文中的实现所支持的最大对齐的对齐来表示,其等于alignof(std::max_align_t)

Just to be sure on the requirement that the allocation functions must behave like this; 只是为了确保分配函数必须像这样的要求;

§3.7.4/3 §3.7.4/ 3

Any allocation and/or deallocation functions defined in a C++ program, including the default versions in the library, shall conform to the semantics specified in 3.7.4.1 and 3.7.4.2. C ++程序中定义的任何分配和/或释放函数,包括库中的缺省版本,都应符合3.7.4.1和3.7.4.2中规定的语义。

Quotes from C++ WD n4527 . 来自C ++ WD n4527的引用

Assuming the 8-byte alignment is less than the fundamental alignment of the platform (and it looks like it is, but this can be verified on the target platform with static_assert(alignof(std::max_align_t) >= 8) ) - you can use the global ::operator new to allocate the memory required. 假设8字节对齐小于平台的基本对齐(看起来很像,但是这可以在目标平台上使用static_assert(alignof(std::max_align_t) >= 8) ) - 你可以使用global ::operator new来分配所需的内存。 Once allocated, the memory can be segmented and used given the size and alignment requirements you have. 分配后,可以根据您的尺寸和对齐要求对存储器进行分段和使用。

An alternative here is the std::aligned_storage and it would be able to give you memory aligned at whatever the requirement is. 这里的另一种选择是 std::aligned_storage ,它可以根据需要为你提供内存对齐。

typename std::aligned_storage<sizeof(T), alignof(T)>::type buffer[100];

From the question, I assume here that the both the size and alignment of T would be 8. 从这个问题来看,我在这里假设T的大小和对齐都是8。


A sample of what the final memory block could look like is (basic RAII included); 最终内存块的样子是(包括基本RAII);

struct DataBlock {
    const std::size_t element_count;
    static constexpr std::size_t element_size = 8;
    void * data = nullptr;
    explicit DataBlock(size_t elements) : element_count(elements)
    {
        data = ::operator new(elements * element_size);
    }
    ~DataBlock()
    {
        ::operator delete(data);
    }
    DataBlock(DataBlock&) = delete; // no copy
    DataBlock& operator=(DataBlock&) = delete; // no assign
    // probably shouldn't move either
    DataBlock(DataBlock&&) = delete;
    DataBlock& operator=(DataBlock&&) = delete;

    template <class T>
    T* get_location(std::size_t index)
    {
        // https://stackoverflow.com/a/6449951/3747990
        // C++ WD n4527 3.9.2/4
        void* t = reinterpret_cast<void*>(reinterpret_cast<unsigned char*>(data) + index*element_size);
        // 5.2.9/13
        return static_cast<T*>(t);

        // C++ WD n4527 5.2.10/7 would allow this to be condensed
        //T* t = reinterpret_cast<T*>(reinterpret_cast<unsigned char*>(data) + index*element_size);
        //return t;
    }
};
// ....
DataBlock block(100);

I've constructed more detailed examples of the DataBlock with suitable template construct and get functions etc., live demo here and here with further error checking etc. . 我已经使用合适的模板constructget函数等构建了更详细的DataBlock示例, 现场演示以及此处的进一步错误检查等

A note on the aliasing 关于别名的说明

It does look like there are some aliasing issues in the original code (strictly speaking); 看起来原始代码中存在一些别名问题(严格来说); you allocate memory of one type and cast it to another type. 你分配一种类型的内存并将其转换为另一种类型。

It may probably work as you expect on your target platform, but you cannot rely on it. 它可能在您的目标平台上按预期工作,但您不能依赖它。 The most practical comment I've seen on this is; 我在这方面看到的最实用的评论是;

"Undefined behaviour has the nasty result of usually doing what you think it should do, until it doesn't” - hvd . “未定义的行为有令人讨厌的结果,通常做你认为它应该做的事情,直到它没有” - hvd

The code you have probably will work. 您可能会使用的代码。 I think it is better to use the appropriate global allocation functions and be sure that there is no undefined behaviour when allocating and using the memory you require. 我认为最好使用适当的全局分配函数,并确保在分配和使用所需的内存时没有未定义的行为。

Aliasing will still be applicable; 别名仍然适用; once the memory is allocated - aliasing is applicable in how it is used. 一旦分配了内存 - 别名适用于它的使用方式。 Once you have an arbitrary block of memory allocated (as above with the global allocation functions) and the lifetime of an object begins (§3.8/1) - aliasing rules apply. 一旦分配了任意内存块(如上面的全局分配函数)并且对象的生命周期开始(§3.8/ 1) - 应用别名规则。

What about std::allocator ? 那么std::allocator呢?

Whilst the std::allocator is for homogenous data containers and what your are looking for is akin to heterogeneous allocations, the implementation in your standard library (given the Allocator concept ) offers some guidance on raw memory allocations and corresponding construction of the objects required. 虽然std::allocator用于同构数据容器,而您正在寻找的类似于异构分配,但标准库中的实现(给定Allocator概念 )提供了有关原始内存分配和所需对象的相应构造的一些指导。

Update for the new question: 更新新问题:

The great news is there's a simple and easy solution to your real problem: Allocate the memory with new ( unsigned char[size] ). 好消息是有一个简单易用的解决方案来解决你的真正问题:用newunsigned char[size] )分配内存。 Memory allocated with new is guaranteed in the standard to be aligned in a way suitable for use as any type, and you can safely alias any type with char* . 使用new分配的内存在标准中保证以适合用作任何类型的方式对齐,并且您可以使用char*安全地为任何类型设置别名。

The standard reference, 3.7.3.1/2, allocation functions: 标准参考,3.7.3.1 / 2,分配函数:

The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type and then used to access the object or array in the storage allocated 返回的指针应适当对齐,以便可以将其转换为任何完整对象类型的指针,然后用于访问分配的存储中的对象或数组


Original answer for the original question: 原始问题的原始答案:

At least in C++98/03 in 3.10/15 we have the following which pretty clearly makes it still undefined behavior (since you're accessing the value through a type that's not enumerated in the list of exceptions): 至少在3.10 / 15中的C ++ 98/03中,我们有以下内容,这显然使它仍然是未定义的行为(因为你通过未在例外列表中枚举的类型访问该值):

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined): 如果程序试图通过不同于以下类型之一的左值访问对象的存储值,则行为未定义):

— the dynamic type of the object, - 对象的动态类型,

— a cvqualified version of the dynamic type of the object, - 对象的动态类型的cvqualified版本,

— a type that is the signed or unsigned type corresponding to the dynamic type of the object, - 与对象的动态类型对应的有符号或无符号类型的类型,

— a type that is the signed or unsigned type corresponding to a cvqualified version of the dynamic type of the object, - 对应于对象动态类型的cvqualified版本的有符号或无符号类型,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), - 在其成员中包含上述类型之一的聚合或联合类型(包括递归地,子聚合或包含联合的成员),

— a type that is a (possibly cvqualified) base class type of the dynamic type of the object, - 一种类型,是对象的动态类型的(可能是cvqualified)基类类型,

— a char or unsigned char type. - char或unsigned char类型。

A lot of discussion here and given some answers that are slightly wrong, but making up good points, I just try to summarize: 这里有很多讨论并给出了一些稍微错误的答案,但是我还要总结一下好点,我只想总结一下:

  • exactly following the text of the standard (no matter what version) ... yes, this is undefined behaviour. 完全遵循标准的文本(无论什么版本)...是的,这是未定义的行为。 Note the standard doesn't even have the term strict aliasing -- just a set of rules to enforce it no matter what implementations could define. 请注意,标准甚至没有术语严格别名 - 只是一组规则来强制执行它,无论实现可以定义什么。

  • understanding the reason behind the "strict aliasing" rule, it should work nicely on any implementation as long as neither float or double take more than 64 bits. 理解“严格别名”规则背后的原因,它应该适用于任何实现,只要 floatdouble都不超过64位。

  • the standard won't guarantee you anything about the size of float or double (intentionally) and that's the reason why it is that restrictive in the first place. 标准不会保证任何关于floatdouble (有意)的大小,这就是为什么它首先限制性的原因。

  • you can get around all this by ensuring your "heap" is an allocated object (eg get it with malloc() ) and access the aligned slots through char * and shifting your offset by 3 bits. 你可以通过确保你的“堆”是一个已分配的对象 (例如用malloc()获取它)并通过char *访问对齐的槽并将偏移量移动3位来解决所有这些问题。

  • you still have to make sure that anything you store in such a slot won't take more than 64 bits. 你仍然需要确保你在这样的插槽中存储的任何东西都不会超过64位。 (that's the hard part when it comes to portability) (这在便携性方面很难)

In a nutshell: your code should be safe on any "sane" implementation as long as size constraints aren't a problem (means: the answer to the question in your title is most likely no ), BUT it's still undefined behaviour (means: the answer to your last paragraph is yes ) 简而言之:只要大小限制不是问题,你的代码应该对任何“理智”实现都是安全的(意味着:标题中问题的答案很可能不是 ),但它仍然是未定义的行为(意味着:你最后一段的答案是肯定的

pc pf and pd are all different types that access memory specified in block as uint64_t , so for say ' pf the shared types are float and uint64_t . pc pfpd都是不同的类型,它们访问block指定的内存为uint64_t ,所以说' pf共享类型是floatuint64_t

One would violate the strict aliasing rule were once to write using one type and read using another since the compile could we reorder the operations thinking there is no shared access. 有人会违反严格的别名规则,一次使用一种类型写入并使用另一种类型读取,因为编译可以重新排序操作,认为没有共享访问。 This is not your case however, since the uint64_t array is only used for assignment, it is exactly the same as using alloca to allocate the memory. 但是,这不是你的情况,因为uint64_t数组仅用于赋值,它与使用alloca分配内存完全相同。

Incidentally there is no issue with the strict aliasing rule when casting from any type to a char type and visa versa. 顺便提一下,当从任何类型转换为char类型时,严格别名规则没有问题,反之亦然。 This is a common pattern used for data serialization and deserialization. 这是用于数据序列化和反序列化的常见模式。

I'll make it short: All your code works with defined semantics if you allocate the block using 我将简短说明:如果使用分配块,所有代码都使用定义的语义

std::unique_ptr<char[], std::free>
    mem(static_cast<char*>(std::malloc(800)));

Because 因为

  1. every type is allowed to alias with a char[] and 允许每个类型使用char[]
  2. malloc() is guaranteed to return a block of memory sufficiently aligned for all types (except maybe SIMD ones). malloc()保证返回一个足够对齐所有类型的内存块(除了SIMD之外)。

We pass std::free as a custom deleter, because we used malloc() , not new[] , so calling delete[] , the default, would be undefined behaviour. 我们将std::free作为自定义删除器传递,因为我们使用malloc()而不是new[] ,因此调用delete[] (默认值)将是未定义的行为。

If you're a purist, you can also use operator new : 如果你是纯粹主义者,你也可以使用operator new

std::unique_ptr<char[]>
    mem(static_cast<char*>(operator new[](800)));

Then we don't need a custom deleter. 然后我们不需要自定义删除器。 Or 要么

std::unique_ptr<char[]> mem(new char[800]);

to avoid the static_cast from void* to char* . 避免static_castvoid*char* But operator new can be replaced by the user, so I'm always a bit wary of using it. 但是operator new可以被用户替换,所以我总是对使用它有点警惕。 OTOH; OTOH; malloc cannot be replaced (only in platform-specific ways, such as LD_PRELOAD ). malloc无法替换(仅限于特定于平台的方式,例如LD_PRELOAD )。

Yes, because the memory locations pointed to by pf could overlap depending on the size of float and double . 是的,因为pf指向的内存位置可能会重叠,具体取决于floatdouble的大小。 If they didn't, then the results of reading *pd and *pf would be well defined but not the results of reading from block or pc . 如果他们没有,那么读取*pd*pf的结果将被很好地定义,但不是从blockpc读取的结果。

The behavior of C++ and the CPU are distinct. C ++和CPU的行为是截然不同的。 Although the standard provides memory suitable for any object, the rules and optimizations imposed by the CPU make the alignment for any given object "undefined" - an array of short would reasonably be 2 byte aligned, but an array of a 3 byte structure may be 8 byte aligned. 尽管标准提供了适用于任何对象的内存,但CPU强加的规则和优化使得任何给定对象的对齐“未定义” - 短的数组可以合理地为2字节对齐,但是3字节结构的数组可能是8字节对齐。 A union of all possible types can be created and used between your storage and the usage to ensure no alignment rules are broken. 可以在存储和使用之间创建和使用所有可能类型的联合,以确保不会破坏对齐规则。

union copyOut {
      char Buffer[200]; // max string length
      int16 shortVal;
      int32 intVal;
      int64 longIntVal;
      float fltVal;
      double doubleVal;
} copyTarget;
memcpy( copyTarget.Buffer, Block[n], sizeof( data ) );  // move from unaligned space into union
// use copyTarget member here.

If you tag this as C++ question, (1) why use uint64_t[] but not std::vector? 如果你把它标记为C ++问题,(1)为什么要使用uint64_t []而不是std :: vector? (2) in term of memory management, your code lack of management logic, which should keep track of which blocks are in use and which are free and the tracking of contiguoous blocks, and of course the allocate and release block methods. (2)在内存管理方面,你的代码缺乏管理逻辑,它应该跟踪哪些块正在使用,哪些是免费的,以及跟踪连续块,当然还有分配和释放块方法。 (3) the code shows an unsafe way of using memory. (3)代码显示使用内存的不安全方式。 For example, the char* is not const and therefore the block can be potentially be written to and overwrite the next block(s). 例如,char *不是const,因此可以写入块并覆盖下一个块。 The reinterpret_cast is consider danger and should be abstract from the memory user logic. reinterpret_cast被认为是危险的,应该从内存用户逻辑中抽象出来。 (4) the code doesn't show the allocator logic. (4)代码不显示分配器逻辑。 In C world, the malloc function is untyped and in C++ world, the operator new is typed. 在C世界中,malloc函数是无类型的,在C ++世界中,运算符new是类型化的。 You should consider something like the new operator. 你应该考虑像new运算符这样的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 这是在C ++中分配和删除动态内存的正确方法吗? - Is this the correct way to allocate and delete dynamic memory in C++? 在C ++构造函数中分配内存的正确方法是什么? - What is the right way to allocate memory in the C++ constructor? 在C ++中为数组正确分配和释放内存 - Correct allocate and free memory for arrays in C++ 使用哪个 - “operator new”或“operator new []” - 在C ++中分配一块原始内存? - Which to use - “operator new” or “operator new[]” - to allocate a block of raw memory in C++? 接收指向C样式字符串的指针作为参数并能够分配或修改内存的正确方法是什么? - What is the correct way to receive a pointer to a C-style string as an argument and be able to allocate memory or modify it? 在c ++中使用带内存屏障的双重检查锁定时正确的方法是什么? - What the correct way when use Double-Checked Locking with memory barrier in c++? Memory 分配在 c++ - Memory allocate in c++ C++ 如果使用新的未请求的 memory 块?会发生什么? - C++ if use a block of memory not requested by new?what will happen? 在 C/C++ 中声明和使用 FILE * 指针的正确方法是什么? - What is the correct way to declare and use a FILE * pointer in C/C++? 将无类型数组编译为C的有效方法是什么? - What is an efficient way to compile untyped arrays to C?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM