简体   繁体   中英

Filtering out JSON by an array

I have a JSON file

{
    "data": [
        {
            "name": "Jake",
            "id": "123"
        },
        {
            "name": "Bob",
            "id": "234"
        }]
}

with all id's unique, and say I have an array of banned ids ["123","423"] and I would like to delete all entries that have an id number in the array (so as an output I'd like the following).

{
    "data": [
        {
            "name": "Bob",
            "id": "234"
        }]
}

What would be a moderately efficient way (runs in a few seconds on an ordinary computer) to achieve this if there's a few thousand entries in the JSON and array?

You can use the Array.prototype.filter() method in conjunction with .indexOf() :

 var bannedIds = ["123", "423"]; var input = { "data": [ { "name": "Jake", "id": "123" }, { "name": "Bob", "id": "234" }] }; input.data = input.data.filter(function(v) { return bannedIds.indexOf(v.id) === -1; }); console.log(input); 

If you don't want to overwrite the original array then just assign the result of the .filter() call to a new variable.

If the above turns out to be too slow with your large amount of data, you can try replacing .filter() with a conventional for loop, and/or replacing .indexOf() with a lookup object created from the array of banned ids.

If you can use ES6, you can do this:

 const source = { "data": [ { "name": "Jake", "id": "123" }, { "name": "Bob", "id": "234" } ] }; const banned = ["123", "423"]; // O(n) startup cost for constant access time later const bannedSet = new Set(banned); // O(n) const result = source.data.filter(x => !bannedSet.has(x.id)); console.log(result); 

As mentioned in the comments, there's a startup cost for creating the Set . However, this lets you then call Set.prototype.has , which is constant.

Then, it's just a matter of iterating over every element and filtering out the ones that are in the banned set.

If you can't use ES6, you could replace Set with a plain JS object. If you have to support IE<9, use a polyfill for Array.prototype.filter (thanks @nnnnnn).

UPDATE

@SpencerWieczorek points out that the ES6 spec seems to indicate that Set.prototype.has iterates. I spoke too soon about the lookup being constant (I was carrying over my experience from other languages). Typically, sets will do better than O(n), eg constant or O(log n) depending on the underlying implementation. Your mileage may vary, so nnnnnn's answer may be faster in some cases.

Try a few of the solutions here with large amounts of data to confirm.

EDIT

I shied away from using filter or the like because that involves creating a new array. That's actually probably fine for the data sizes we're talking about, but the approach I have below is more efficient.


On my laptop, this whole program runs in about 0.2 seconds. (It uses 10,000 entries and 100 banned IDs.)

var o = {
    data: []
};

for (var i = 0; i < 10000; i++) {
    o.data.push({
        name: i % 2 === 0 ? 'Jake' : 'Bob', // couldn't think of more names :-)
        id: ''+i // convert to string
    });
}

var banned = {};

for (var i = 0; i < 100; i++) {
    banned[''+(i * 3)] = true; // ban 0, 3, 6, 9, 12, ...
}

for (var i = o.data.length - 1; i >= 0; i--) {
    if (banned[o.data[i].id]) {
        o.data.splice(i, 1);
    }
}

console.log(o);

// { data:
//    [ { name: 'Bob', id: '1' },
//      { name: 'Jake', id: '2' },
//      { name: 'Jake', id: '4' },
//      { name: 'Bob', id: '5' },
//      { name: 'Bob', id: '7' },
//      { name: 'Jake', id: '8' },
//      { name: 'Jake', id: '10' },
//      ...

I am assuming that you have already parsed the JSON data and you have a variable pointing to the array you want to filter. Also, you have an array with the "banned" IDs.

var data = [{
        "name": "Jake",
        "id": "123"
    }, {
        "name": "Bob",
        "id": "234"
    }, {
        "name": "Joe",
        "id": "345"
    }];

var banned = ["123", "345"];

The following function wil probably do the best job that can be done in terms of performance:

// Modifies the data array "in place", removing all elements
// whose IDs are found in the "banned" array
function removeBanned(data, banned) {
    // Index the "banned" IDs by writing them as the properties
    // of a JS object for really quick read access later on
    var bannedObj = {};
    banned.forEach(function(b) { bannedObj[b] = true; });

    var index = data.length - 1;

    while (index >= 0) {
        if (bannedObj[data[index].id]) {
            data.splice(index, 1);
        }
        --index;
    }
}

This one seems fast enough, but I'd suggest you make a free clean copy instead of modifying the existing array, - it may be faster.

 function filterout(o,p,f) { var i = 0; f = f.join(); while( o[i] ) { if( f.match( o[i][p] ) ){ o.splice(i,1) } i++ }; } var filter = ["123","423"]; var object = { "data": [ { "name": "John", "id": "723" }, { "name": "Jake", "id": "123" }, { "name": "Bob", "id": "234" }] }; filterout( object.data, "id", filter ); console.log(JSON.stringify( object )); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM