简体   繁体   中英

Existence of objects created in C functions

It has been established (see below) placement new is required to create objects

int* p = (int*)malloc(sizeof(int));
*p = 42;  // illegal, there isn't an int

Yet that is a pretty standard way of creating objects in C.

The question is, does the int exist if it is created in C, and returned to C++?

In other words, is the following guaranteed to be legal? Assume int is the same for C and C++.

foo.h

#ifdef __cplusplus
extern "C" {
#endif

int* foo(void);

#ifdef __cplusplus
}
#endif

foo.c

#include "foo.h"
#include <stdlib.h>

int* foo(void) {
    return malloc(sizeof(int));
}

main.cpp

#include "foo.h"
#include<cstdlib>

int main() {
    int* p = foo();
    *p = 42;
    std::free(p);
}

Links to discussions on mandatory nature of placement new :

Yes! But only because int is a fundamental type. Its initialization is vacuous operation:

[dcl.init]/7 :

To default-initialize an object of type T means:

  • If T is a (possibly cv-qualified) class type, constructors are considered. The applicable constructors are enumerated ([over.match.ctor]), and the best one for the initializer () is chosen through overload resolution. The constructor thus selected is called, with an empty argument list, to initialize the object.

  • If T is an array type, each element is default-initialized.

  • Otherwise, no initialization is performed.

Emphasis mine. Since "not initializing" an int is akin to default initialing it, it's lifetime begins once storage is allocated:

[basic.life]/1 :

The lifetime of an object or reference is a runtime property of the object or reference. An object is said to have non-vacuous initialization if it is of a class or aggregate type and it or one of its subobjects is initialized by a constructor other than a trivial default constructor. The lifetime of an object of type T begins when:

  • storage with the proper alignment and size for type T is obtained, and
  • if the object has non-vacuous initialization, its initialization is complete,

Allocation of storage can be done in any way acceptable by the C++ standard. Yes, even just calling malloc . Compiling C code with a C++ compiler would be a very bad idea otherwise. And yet, the C++ FAQ has been suggesting it for years .


In addition, since the C++ standard defers to the C standard where malloc is concerned. I think that wording should be brought forth as well. And here it is:

7.22.3.4 The malloc function - Paragraph 2 :

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

The "value is indeterminate" part kinda indicates there's an object there. Otherwise, how could it have any value, let alone an indeterminate one?

I think the question is badly posed. In C++ we only have the concepts of translation units and linkage, the latter simply meaning under which circumstances names declared in different TUs refer to the same entity or not.

Nothing is virtually said about the linking process as such, the correctness of which must be guaranteed by the compiler/linker anyway; even if the code snippets above were purely C++ sources (with malloc replaced with a nice new int) the result would be still implementation defined ( eg. consider object files compiled with incompatible compiler options/ABIs/runtimes ).

So, either we talk in full generality and conclude that any program made of more than one TU is potentially wrong or we must take for granted that the linking process is 'valid' ( only the implementation knows ) and hence take for granted that if a function from some source language ( in this case C ) primises to return a 'pointer to an existing int' then the the same function in the destination language (C++) must still be a 'pointer to an existing int' (otherwise, following [dcl.link], we could't say that the linkage has been 'achieved', returning to the no man's land ).

So, in my opinion, the real problem is assessing what an 'existing' int is in C and C++, comparatively. As I read the correponding standards, in both languages an int lifetime basically begins when its storage is reserved for it: in the OP case of an allocated(in C)/dynamic(in c++) storage duration object, this occurs (on C side) when the effective type of the lvalue *pointer_to_int becomes int (eg. when it's assigned a value; until then, the not-yet-an-int may trap(*)).

This does not happen in the OP case, the malloc result has no effective type yet. So, that int does not exist neither in C nor in C++, it's just a reinterpreted pointer.

That said, the c++ part of the OP code assigns just after returning from foo(); if this was intended, then we could say that given that malloc() in C++ is required having C semantics, a placement new on the c++ side would suffice to make it valid (as the provided links show).

So, summarizing, either the C code should be fixed to return a pointer to an existing int (by assigning to it) or the c++ code should be fixed by adding placement new. (sorry for the lengthy arguing ... :))


(*) here I'm not claiming that the only issue is the existence of trap representation; if it were, one could argue that the result of foo() is an indeterminate value on C++ side, hence something that you can safely assign to. Clearly this is not the case because there are also aliasing rules to take into account ...

I can identify two parts of this question that should be addressed separately.


Object lifetime

It has been established (see below) placement new is required to create objects

I posit that this area of the standard contains ambiguity, omission, contradiction, and/or gratuitous incompatibility with existing practice, and should therefore be considered broken .

The only people who should be interested in what a broken part of the standard actually says are the people responsible for fixing the breakage. Other people (language users and language implementors alike) should defer to existing practice and common sense. Both of which say that one does not need new to create an int , malloc is enough.

This document identifies the problem and proposes a fix (thanks @TC for the link)


C compatibility

Assume int is the same for C and C++

It is not enough to assume that.

One also needs to assume that int* is the same, that the same memory is accessible by C and C++ functions linked together in a program, and that the C++ implementation does not define the semantics of calls to functions written in the C programming language to be wiping your hard drive and stealing your girlfriend. In other words, that C and C++ implementations are compatible enough .

None of this is stipulated by the standard or should be assumed. Indeed, there are C implementations that are incompatible with each other, so they cannot be both compatible with the same C++ implementation.

The only thing the standard says is "Every implementation shall provide for linkage to functions written in the C programming language" (dcl.link) What is the semantics of such linkage is left undefined.

Here, as before, the best course of action is to defer to existing practice and common sense. Both of which say that a C++ implementation usually comes bundled with a compatible enough C implementation, with the linkage working as one would expect.

The question is meaningless. Sorry. This is the only "lawyer" answer possible.

It is meaningless because the C++ and the C language ignore each others, as they ignore anything else.

Nothing in either language is described in term of low level implementation (which is ridiculous for languages often described as "high level assembly"). Both C and C++ are specified (if you can call that a specification) at a very abstract level, and the high and low levels are never reconnected. This generates endless debates about what undefined behaviors means in practice, how unions work, etc.

Although neither the C Standard nor, so far as I know, the C++ Standard officially recognizes the concept, almost any platform which allows programs produce by different compilers to be linked together will support opaque functions .

When processing a call to an opaque function, a compiler will start by ensuring that the value of all objects that might legitimately be examined by outside code is written to the storage associated with those objects. Once that is done, it will place the function's arguments in places specified by the platform's documentation (the ABI, or Application Binary Interface) and perform the call.

Once the function returns, the compiler will assume that any objects which an outside function could have written, may have been written, and will thus reload any such values from the storage associated with those objects the next time they are used.

If the storage associated with an object holds a particular bit pattern when an opaque function returns, and if the object would hold that bit pattern when it has a defined value, then a compiler must behave as though the object has that defined value without regard for how it came to hold that bit pattern.

The concept of opaque functions is very useful, and I see no reason that the C and C++ Standards shouldn't recognize it, nor provide a standard "do nothing" opaque function. To be sure, needlessly calling opaque functions will greatly impede what might otherwise be useful optimizations, but being able to force a compiler to treat actions as opaque function calls when needed may make it possible to enable more optimizations elsewhere.

Unfortunately, things seem to be going in the opposite direction, with build systems increasingly trying to apply "whole program" optimization. WPO would be good if there were a way of distinguishing between function calls that were opaque because the full "optimization barrier" was needed, from those which had been treated as opaque simply because there was no way for optimizers to "see" across inter-module boundaries. Unless or until proper barriers are added, I don't know any way to ensure that optimizers won't get "clever" in ways that break code which would have had defined behavior with the barriers in place.

No, the int does not exist, as explained in the linked Q/As. An important standard quote reads like this in C++14:

1.8 The C ++ object model [intro.object]
[...] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed. [...]

(12.2 is a paragraph about temporary objects)

The C++ standard has no rules for interfacing C and C++ code. A C++ compiler can only analyze objects created by C++ code, but not some bits passed to it form an external source like a C program, or a network interface, etc.

Many rules are tailored to make optimizations possible. Some of them are only possible if the compiler does not have to assume uninitialized memory contains valid objects. For example, the rule that one may not read an uninitialized int would not make sense otherwise, because if int s may exist anywhere, why would it be illegal to read an indeterminate int value?

This would be a standard compliant way to write the program:

int main() {
    void* p = foo();
    int i = 42;
    memcpy(p, &i, sizeof(int));
    //std::free(p);   //this works only if C and C++ use the same heap.
}

I believe it is legal now, and retroactively since C++98!

Indeed the C++ specification wording till C++20 was defining an object as (eg C++17 wording, [intro.object]):

The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is created by a definition (6.1), by a new-expression (8.5.2.4), when implicitly changing the active member of a union (12.3), or when a temporary object is created (7.4, 15.2).

The possibility of creating an object using malloc allocation was not mentioned . Making it a de-facto undefined behavior.

It was then viewed as a problem , and this issue was addressed later by https://wg21.link/P0593R6 and accepted as a DR against all C++ versions since C++98 inclusive, then added into the C++20 spec, with the new wording.

The wording of the standard are quite vague and may even seem to use tautology , defining a well defined implicitly-created objects (6.7.2.11 Object model [intro.object] ) as:

implicitly-created objects whose address is the address of the start of the region of storage, and produce a pointer value that points to that object, if that value would result in the program having defined behavior [...]

The example given in C++20 spec is:

#include <cstdlib>
struct X { int a, b; };
X *make_x() {
   // The call to std​::​malloc implicitly creates an object of type X
   // and its subobjects a and b, and returns a pointer to that X object
   // (or an object that is pointer-interconvertible ([basic.compound]) with it), 
   // in order to give the subsequent class member access operations   
   // defined behavior. 
   X *p = (X*)std::malloc(sizeof(struct X));
   p->a = 1;   
   p->b = 2;
   return p;
}

It seems that the `object` created in a C-function as in the OP question, falls into this category and is a valid object. Which would be the case also for allocation of C-structs with malloc .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM