简体   繁体   中英

Node.js: How to serialize a large object with circular references

I use Node.js and want to serialize a large javascript object to HDD. The object is basically a "hashmap" and only contains data, not functions. The object contains elements with circular references.

This is an online application so the process should not block the main loop. In my use-case Non-blocking is much more important than speed (data is live in-memory data and is only load at startup, saves are for timed backups every X minutes and at shutdown/failure)

What is the best way to do this? Pointers to libraries that do what I want are more than welcome.

I have a nice solution I've been using. Its downside is that it has an O(n^2) runtime which makes me sad.

Here's the code:

 // I defined these functions as part of a utility library called "U". var U = { isObj: function(obj, cls) { try { return obj.constructor === cls; } catch(e) { return false; }; }, straighten: function(item) { /* Un-circularizes data. Works if `item` is a simple Object, an Array, or any inline value (string, int, null, etc). */ var arr = []; U.straighten0(item, arr); return arr.map(function(item) { return item.calc; }); }, straighten0: function(item, items) { /* The "meat" of the un-circularization process. Returns the index of `item` within the array `items`. If `item` didn't initially exist within `items`, it will by the end of this function, therefore this function always produces a usable index. Also, `item` is guaranteed to have no more circular references (it will be in a new format) once its index is obtained. */ /* STEP 1) If `item` is already in `items`, simply return it. Note that an object's existence can only be confirmed by comparison to itself, not an un-circularized version of itself. For this reason an `orig` value is kept ahold of to make such comparisons possible. This entails that every entry in `items` has both an `orig` value (the original object, for comparison) and a `calc` value (the calculated, un circularized value). */ for (var i = 0, len = items.length; i < len; i++) // This is O(n^2) :( if (items[i].orig === item) return i; var ind = items.length; // STEP 2) Depending on the type of `item`, un-circularize it differently if (U.isObj(item, Object)) { /* STEP 2.1) `item` is an `Object`. Create an un-circularized version of that `Object` - keep all its keys, but replace each value with an index that points to that values. */ var obj = {}; items.push({ orig: item, calc: obj }); // Note both `orig` AND `calc`. for (var k in item) obj[k] = U.straighten0(item[k], items); } else if (U.isObj(item, Array)) { /* STEP 2.2) `item` is an `Array`. Create an un-circularized version of that `Array` - replace each of its values with an index that indexes the original value. */ var arr = []; items.push({ orig: item, calc: arr }); // Note both `orig` AND `calc`. for (var i = 0; i < item.length; i++) arr.push(U.straighten0(item[i], items)); } else { /* STEP 2.3) `item` is a simple inline value. We don't need to make any modifications to it, as inline values have no references (let alone circular references). */ items.push({ orig: item, calc: item }); } return ind; }, unstraighten: function(items) { /* Re-circularizes un-circularized data! Used for undoing the effects of `U.straighten`. This process will use a particular marker (`unbuilt`) to show values that haven't yet been calculated. This is better than using `null`, because that would break in the case that the literal value is `null`. */ var unbuilt = { UNBUILT: true }; var initialArr = []; // Fill `initialArr` with `unbuilt` references for (var i = 0; i < items.length; i++) initialArr.push(unbuilt); return U.unstraighten0(items, 0, initialArr, unbuilt); }, unstraighten0: function(items, ind, built, unbuilt) { /* The "meat" of the re-circularization process. Returns an Object, Array, or inline value. The return value may contain circular references. */ if (built[ind] !== unbuilt) return built[ind]; var item = items[ind]; var value = null; /* Similar to `straighten`, check the type. Handle Object, Array, and inline values separately. */ if (U.isObj(item, Object)) { // value is an ordinary object var obj = built[ind] = {}; for (var k in item) obj[k] = U.unstraighten0(items, item[k], built, unbuilt); return obj; } else if (U.isObj(item, Array)) { // value is an array var arr = built[ind] = []; for (var i = 0; i < item.length; i++) arr.push(U.unstraighten0(items, item[i], built, unbuilt)); return arr; } built[ind] = item; return item; }, thingToString: function(thing) { /* Elegant convenience function to convert any structure (circular or not) to a string! Now that this function is available, you can ignore `straighten` and `unstraighten`, and the headaches they may cause. */ var st = U.straighten(thing); return JSON.stringify(st); }, stringToThing: function(string) { /* Elegant convenience function to reverse the effect of `U.thingToString`. */ return U.unstraighten(JSON.parse(string)); } }; var circular = { val: 'haha', val2: [ 'hey', 'ho', 'hee' ], doesNullWork: null }; circular.circle1 = circular; circular.confusing = { circular: circular, value: circular.val2 }; console.log('Does JSON.stringify work??'); try { var str = JSON.stringify(circular); console.log('JSON.stringify works!!'); } catch(err) { console.log('JSON.stringify doesn\\'t work!'); } console.log(''); console.log('Does U.thingToString work??'); try { var str = U.thingToString(circular); console.log('U.thingToString works!!'); console.log('Its result looks like this:') console.log(str); console.log('And here\\'s it converted back into an object:'); var obj = U.stringToThing(str); for (var k in obj) { console.log('`obj` has key "' + k + '"'); } console.log('Did `null` work?'); if (obj.doesNullWork === null) console.log('yes!'); else console.log('nope :('); } catch(err) { console.error(err); console.log('U.thingToString doesn\\'t work!'); } 

The whole idea is to serialize some circular structure by placing every object within directly into an array.

Eg if you have an object like this:

{
    val: 'hello',
    anotherVal: 'hi',
    circular: << a reference to itself >>
}

Then U.straighten will produce this structure:

[
    0: {
        val: 1,
        anotherVal: 2,
        circular: 0 // Note that it's become possible to refer to "self" by index! :D
    },
    1: 'hello',
    2: 'hi'
]

Just a couple of extra notes:

  • I've been using these functions for quite some time in a wide variety of situations! It's very unlikely there are hidden bugs.

  • The O(n^2) runtime issue could be defeated with an ability to map every object to a unique hash value (which can be implemented). The reason for the O(n^2) nature is a linear search must be used to find items that have already been circularized. Because this linear search is occurring within an already linear process, the runtime becomes O(n^2)

  • These methods actual provide a small amount of compression! Inline values that are the same will not occur twice at different indexes. All same instances of an inline value will be mapped to the same index. Eg:

     { hi: 'hihihihihihihihihihihi-very-long-pls-compress', ha: 'hihihihihihihihihihihi-very-long-pls-compress' } 

    Becomes (after U.straighten ):

     [ 0: { hi: 1, ha: 1 }, 1: 'hihihihihihihihihihihi-very-long-pls-compress' ] 
  • And finally, in case it wasn't clear using this code is very easy!! You only need to ever look at U.thingToString and U.stringToThing . The usage of these functions is precisely the same as the usage of JSON.stringify and JSON.parse .

     var circularObj = // Some big circular object you have var serialized = U.thingToString(circularObj); var unserialized = U.stringToThing(serialized); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM