简体   繁体   中英

How to override c++ malloc/free in javascript (emscripten)?

I override the Module._malloc and Module._free in Javascript(emscripten) by wrapping the original function and just adding Console.log to display memory address, size and total allocated memory.

I discovered that the new function only catches Javascript calls to Module._malloc and Module._free, and it does not catch c++ level calls to malloc() and free(). I would like to know why.

Based on Mr. Ofria's answer here https://stackoverflow.com/a/34057348/4806940 , Module._malloc and Module._free are the converted equivalent code of c++'s malloc() and free().

I am using emscripten 1.35.0

Edit: Heres how i wrapped the function in javascript

var _defaultMalloc = Module._malloc;
var _defaultFree = Module._free;

var _totalMemoryUsed = 0;
var _mallocTracker = {};
Module._malloc = function(size) {
   _totalMemoryUsed += size;
   var ptr = _defaultMalloc(size)
   _mallocTracker[ptr] = size;

   console.log("MALLOC'd @" + ptr + " " + size + " bytes -- TOTAL USED " + _totalMemoryUsed + " bytes");
   return ptr;
}

Module._free = function(ptr) {
   var size = _mallocTracker[ptr];
   _totalMemoryUsed -= size;

   console.log("FREE'd @" + ptr + " " + size + " bytes -- TOTAL USED " + _totalMemoryUsed + " bytes");
   return _defaultFree(ptr);
}

Short answer : your attempt at wrapping malloc / free doesn't work because the Module object that exposes Emscripten's implementation of malloc() / free() are not the entry-points called by native C++ code. However, with a little bit of hackery, there are ways you can trace those calls.


Why Your Overrides Don't Work

I think the answer you quote might be better worded as: the emulation of C++'s malloc() and free() calls is exposed in Module._malloc() and Module._free() , but these are not the entry-points called by converted C++ code.

Note : I will generally only talk of malloc for the remainder of this answer, but essentially everything that applies to malloc also applies to free .

I'll leave all the gory details of how Emscripten handles malloc() to later, but in brief:

  • Using "standard settings", Emscripten compiles a C++ program to a.out.js .

  • A large chunk of this file creates an asm object. This contains all the converted C++ code (eg the JavaScript implementation of _main() ) and JavaScript versions of C++ library functions (in particular, _malloc() ).

  • The converted C++ code (within asm ) makes direct reference to the internal library functions (also within asm ).

  • References to the C++ functions and many of the library functions (in particular _main , _malloc and _free ) are exposed as properties of the asm object. They are also exposed as properties of the Module object and exist as standalone variables.

So, original C++ code will only call the internal implementation of _malloc() defined within the asm block of code. The rest of the Emscripten framework, and any additional JavaScript code can also call this function via any of the exposed references: _malloc , Module._malloc (or Module['_malloc'] ) and asm._malloc (or asm['_malloc'] ).

Therefore, if you replace any or all of _malloc , Module._malloc or asm._malloc with "wrapped" versions, this will only affect calls made from the rest of the Emscripten framework or additional JavaScript code. It will not affect the calls made from converted C++ code.


Ways of Tracing Calls to _malloc() / _free()

1. The Official Way

Before we get into some low-level hackery I should mention that Emscripten has a Tracing API built-in which (according to their help page) " provides some useful capabilities to better see what is going on inside of your application, in particular with respect to memory usage ".

I've not tried to use it, but for serious debugging work, this is probably the way to go. However, it seems to need some "up-front" effort expending (you need to set up a separate process to receive trace messages from the application under test), so it may be "overkill" for some situations.

If you want to pursue this, the official documentation can be found here and this blog post describes how one company used the Tracing API to their advantage (I have no affiliation: that page just turned up in search results).

2. Hacking It

As noted above, the problem is that the calls made by converted C++ calls are to internal functions within the asm object, and so are not affected by any wrappers we might create at the "outside" level. After some investigation, I have devised two ways of overcoming this problem. As both are a little "hacky", purists may want to look away...

First, let's start with a small piece of code to serve as our test-bed (adapted from that found on the Emscripten Tutorial page):

hello.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
  char* msg = malloc(1234321) ;
  strcpy( msg, "Hello, world!" ) ;
  printf( "%s\n", msg ) ;
  free( msg ) ;
  return 0;
}

Note : The number 1234321 was chosen simply to aid searching the generated JavaScript file. This happily compiles and runs as expected:

C:\Program Files\Emscripten\Test>emcc hello.c

C:\Program Files\Emscripten\Test>node a.out.js
Hello, world!

We will now create the following JavaScript file to "wrap" malloc and free :

traceMalloc.js

Module={
  'preRun': function() {
    // Edit below or make an option to selectively wrap malloc/free.
    if( true ) {
      console.log( 'Wrapping malloc/free' ) ;
      var real_malloc = _malloc ;
      Module['_malloc'] = asm['_malloc'] = _malloc = function( size ) {
        console.log( '_malloc( ' + size + ' )' ) ;
        var result = real_malloc.apply( null, arguments ) ;
        console.log( '<--- ' + result ) ;
        return result ;
      }
      var real_free = _free ;
      Module['_free'] = asm['_free'] = _free = function( ptr ) {
        console.log( '_free( ' + ptr + ' )' ) ;
        var result = real_free.apply( null, arguments ) ;
        console.log( '<--- ' + result ) ;
        return result ;
      }
      // Hack 2b: invoke semi-permanent code added to emscripten.py
      //asm.wrapMallocFree();        }
  }
}

The Module['preRun'] is a way of getting our code executed shortly before the main entry-point. Inside the function, we save a reference to the "real" _malloc routine and then create a new function that calls the original, wrapped in trace-messages. The new function replaces all three "external" references to the original _malloc .

(For now, ignore the two commented-out lines near the bottom: they will be used later).

If we compile and run this (using the --pre-js option to tell Emscripten to include our piece of JavaScript in the output a.out.js file), we have, as the OP found, only limited success:

C:\Program Files\Emscripten\Test>emcc --pre-js traceMalloc.js hello.c

C:\Program Files\Emscripten\Test>node a.out.js
Wrapping malloc/free
_malloc( 42 )
<--- 5251080
_malloc( 5 )
<--- 5251128
Hello, world!

There are two calls to _malloc from somewhere in the Emscripten framework, but the one we're interested in – the one from our C code – has not been traced.

2a. One-Shot Hack

If we examine the a.out.js file, we will find the following snippet, which is the start of our C code converted to JavaScript:

function _main() {
 var $0 = 0, $1 = 0, $2 = 0, $3 = 0, $4 = 0, $fred = 0, $vararg_buffer = 0, label = 0, sp = 0;
 sp = STACKTOP;
 STACKTOP = STACKTOP + 16|0; if ((STACKTOP|0) >= (STACK_MAX|0)) abort();
 $vararg_buffer = sp;
 $0 = 0;
 $1 = (_malloc(1234321)|0);

The problem being that the call to _malloc references the internal function, not our overridden one. To fix this, we can edit a.out.js to add the following two lines at the top of _main() :

function _main() {
 _malloc = asm._malloc;
 _free = asm._free;

This replaces the internal properties _malloc and _free with references to public versions held by the asm object (which have been, by now, replaced with our "wrapped" versions). Although this might seem somewhat circular, it works (the wrapped versions have already stored a reference to the real malloc function so they still call that, and not the reference we've just overwritten).

If we now re-run the a.out.js file ( without rebuilding):

C:\Program Files\Emscripten\Test>node a.out.js
Wrapping malloc/free
_malloc( 42 )
<--- 5251080
_malloc( 5 )
<--- 5251128
_malloc( 1234321 )
<--- 5251144
Hello, world!
_free( 5251144 )
<--- undefined

We can now see that the original C calls to malloc and free are being traced. While this works, and is easy to apply, the changes will be lost the next time we run emcc so we would have to re-apply the fix every time.

2b. Hacking the Framework

Instead of editing the generated a.out.js each time, it is possible to edit a small part of one file in the Emscripten framework to get a "fix" that only needs to be applied once.

Warning

If you adopt this method keep an original copy of the file to be modified. Also, while I believe my suggested modification to be safe, I have not tested it beyond what was needed for this answer. Use with due caution!

The file in question is emscripten\\1.35.0\\emscripten.py off the main installation directory (at least under Windows). Presumably the middle part of the path will change with different versions of Emscripten. There are two changes needed, probably best shown using the output of the fc command:

C:\Program Files\Emscripten\emscripten\1.35.0>fc emscripten.py.original emscripten.py
Comparing files emscripten.py.original and EMSCRIPTEN.PY
***** emscripten.py.original
    exports = []
    for export in all_exported:
***** EMSCRIPTEN.PY
    exports = []
    all_exported.append('wrapMallocFree')                 <--- Add this line
    for export in all_exported:
*****

***** emscripten.py.original
// EMSCRIPTEN_START_FUNCS
function stackAlloc(size) {
***** EMSCRIPTEN.PY
// EMSCRIPTEN_START_FUNCS
function wrapMallocFree() {                              <--- Add these lines
  console.log( 'wrapMallocFree()' ) ;                    <--- Add these lines
  _malloc = asm._malloc ;                                <--- Add these lines
  _free = asm._free ;                                    <--- Add these lines
}                                                        <--- Add these lines
function stackAlloc(size) {
*****

In my copy, the first change is at line 680 and the second at line 964. The first change tells the framework to export the function wrapMallocFree from the asm object; the second change defines the function that will be exported. As can be seen, this simply executes the same two lines as we manually edited in section 2a (along with an entirely optional trace-line, to show the activation has happened).

To make use of this change we also need to un-comment the call to our new function in traceMalloc.js so it reads:

        return result ;
      }
      // Hack 2b: invoke semi-permanent code added to emscripten.py
      asm.wrapMallocFree();        }
  }
}

Now, we can re-build and re-run the code and see all calls traced without manual editing of a.out.js :

C:\Program Files\Emscripten\Test>emcc --pre-js traceMalloc.js hello.c

C:\Program Files\Emscripten\Test>node a.out.js
Wrapping malloc/free
wrapMallocFree()
_malloc( 42 )
<--- 5251080
_malloc( 5 )
<--- 5251128
_malloc( 1234321 )
<--- 5251144
Hello, world!
_free( 5251144 )
<--- undefined

As the if( true ) ... bit of traceMalloc.js suggests, we can leave the changes to emscripten.py in place and selectively turn on or off tracing of malloc and free . When not used, the only effect is that asm exports one more function ( wrapMallocFree ) which will never get called. From what I can see of the rest of that file, this should not cause any problem (nothing else will know it's there). Even if your C/C++ code were to contain a function called wrapMallocFree , because such names are prefixed with an underscore ( main becomes _main etc.), there should be no clash.

Obviously, should you switch to a different version of Emscripten, you would need to re-apply the same (or similar) changes.


All the Gory Details

As promised, some details of what's happening with malloc inside Emscripten's generated code.

Things get 'iffy'

As noted above, a very large chunk of the generated a.out.js (about 60% for the test program) consists of the creation of an asm object. This code is bracketed by EMSCRIPTEN_START_ASM and EMSCRIPTEN_END_ASM and at a fairly high level looks like:

// EMSCRIPTEN_START_ASM
var asm = (function(global, env, buffer) {

   ...

   function _main() {
      ...
      $1 = (_malloc(1234321)|0);
      ...
   }

   ...

   function _malloc($bytes) {
      ...
      return ($mem$0|0);
   }

   ...

   return { ... _malloc: _malloc, ... };
})
// EMSCRIPTEN_END_ASM
(Module.asmGlobalArg, Module.asmLibraryArg, buffer);

The object asm is defined using the immediately invoked function expression (IIFE) pattern . Essentially, the whole block defines an anonymous function which is immediately executed. The result of executing that function is what is assigned to the object asm . This execution happens at the time the above code is encountered. The main point of an "IIFE" is that variables/functions defined within that anonymous function are only visible to the code within that function. All the "outside world" sees is whatever that function returns (which is assigned to asm ).

Of interest to us, we see definitions of both _main (converted C code) and _malloc (Emscripten's implementation of a memory allocator). Because of the way JavaScript/IIFEs work, when executing the code in _main , it's call to _malloc will always refer to this internal version of _malloc .

The return value of the IIFE is an object with a number of properties. It so happens that the names of the properties of this object happen to be the same as the names of objects/functions within the anonymous function. While this might appear confusing, no ambiguity is involved. The returned object (assigned to asm ) has a property called _malloc . The value of that property is set equal to the value of the internal object _malloc (the definition of a function essentially creates a property/object that references the "block of code" that is the body of the function. This reference can be manipulated like all other references).

Definition of Module

Shortly after the construction we have the following block of code:

var _free = Module["_free"] = asm["_free"];
var _main = Module["_main"] = asm["_main"];
var _i64Add = Module["_i64Add"] = asm["_i64Add"];
var _memset = Module["_memset"] = asm["_memset"];
var runPostSets = Module["runPostSets"] = asm["runPostSets"];
var _malloc = Module["_malloc"] = asm["_malloc"];

For selected properties of the newly-created asm object, this does two things: (a) it creates properties in a second object ( Module ) which reference the same thing as the property of asm does, and (b) it creates some global variables which also reference those properties. The global variables are for use by other parts of the Emscripten framework; the Module object is for use by other JavaScript code that might be added to the Emscripten-generated code.

All Roads Lead to _malloc

At this point, we have the following:

  • There is a block of code, defined within the anonymous function used to create asm , that provides Emscripten's implementation/emulation of C/C++'s _malloc function. This code is the "real malloc". It should be noted that this code "exists" more-or-less independently of whatever objects/properties (if any) "reference" it.

  • There is an internal object of the IIFE called _malloc that currently references the above code. Calls to malloc() made by the original C/C++ code will be made using the value of this object.

  • The object asm has a property called _malloc that also currently references the above block of code.

  • The object Module also has a property called _malloc that currently references the above block of code.

  • There is a global object _malloc . Unsurprisingly, it also references the above block of code.

At this point, using _malloc (global-scope), Module._malloc (or Module['_malloc'] , asm._malloc or _malloc (within the IIFE used to build asm ) will all end up at the same block of code – the "real" implementation of malloc() .

When the following snippet of code is executed (within a function context):

      var real_malloc = _malloc ;
      Module['_malloc'] = asm['_malloc'] = _malloc = function( size ) {
        console.log( '_malloc( ' + size + ' )' ) ;
        var result = real_malloc.apply( null, arguments ) ;
        console.log( '<--- ' + result ) ;
        return result ;
      }

then several things happen:

  • A copy of the original value of the (global) object _malloc is made ( real_malloc ). This, as we saw above, holds a reference to the "real" block of code that implements malloc() . While this happens to be the same value as the IIFE-internal object _malloc , there is no connection between the two. If/when the value of the IIFE-internal _malloc is changed, it will not affect the value held in real_malloc .

  • A new (anonymous) function is created. It contains a call to the "real" implementation of malloc() (using the object real_malloc created above) as well as some log-messages to trace the call.

  • References to this new function are stored in the three "outside" objects we mentioned above: _malloc (global-scope), Module._malloc and asm._malloc . The IIFE-internal object _malloc is still left pointing at the "real implementation" of malloc() .

We are now at the stage the OP got to: external calls to malloc() (made from the Emscripten framework or other bits of JavaScript code) will be funneled through the "wrapper" functions and can be traced. Calls made from converted C/C++ code – which use the IIFE-internal object _malloc – are still directed to the "real" implementation and are not traced.

When the following is executed within the context of the anonymous IIFE function :

_malloc = asm._malloc ;

Then (and only then) will the IIFE-internal object _malloc be changed. By the time this is executed, it's new value ( asm._malloc ) is referencing our "wrapper" function. At that point all four variants of the "references-to-malloc" are pointing at our "wrapper" function. That function still has access (via the variable real_malloc ) to the "real" implementation of malloc() so now, whenever any part of the code calls malloc() , that call passes through our wrapper function so the call can be traced.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM