简体   繁体   English

通过数组过滤出JSON

[英]Filtering out JSON by an array

I have a JSON file 我有一个JSON文件

{
    "data": [
        {
            "name": "Jake",
            "id": "123"
        },
        {
            "name": "Bob",
            "id": "234"
        }]
}

with all id's unique, and say I have an array of banned ids ["123","423"] and I would like to delete all entries that have an id number in the array (so as an output I'd like the following). 具有所有ID的唯一标识,并说我有一个被禁止的ID [[123“,” 423“]数组,我想删除该数组中具有ID号的所有条目(例如,作为输出,我想要以下内容)。

{
    "data": [
        {
            "name": "Bob",
            "id": "234"
        }]
}

What would be a moderately efficient way (runs in a few seconds on an ordinary computer) to achieve this if there's a few thousand entries in the JSON and array? 如果JSON和数组中有几千个条目,那么一种中等有效的方法(在普通计算机上运行几秒钟)来实现这一目标?

You can use the Array.prototype.filter() method in conjunction with .indexOf() : 您可以将Array.prototype.filter()方法与.indexOf()结合使用:

 var bannedIds = ["123", "423"]; var input = { "data": [ { "name": "Jake", "id": "123" }, { "name": "Bob", "id": "234" }] }; input.data = input.data.filter(function(v) { return bannedIds.indexOf(v.id) === -1; }); console.log(input); 

If you don't want to overwrite the original array then just assign the result of the .filter() call to a new variable. 如果您不想覆盖原始数组,则只需将.filter()调用的结果分配给新变量。

If the above turns out to be too slow with your large amount of data, you can try replacing .filter() with a conventional for loop, and/or replacing .indexOf() with a lookup object created from the array of banned ids. 如果上面原来是与你的大数据量的速度太慢,你可以尝试更换.filter()与传统for循环,和/或更换.indexOf()与取缔ID的阵列中创建一个查询对象。

If you can use ES6, you can do this: 如果可以使用ES6,则可以执行以下操作:

 const source = { "data": [ { "name": "Jake", "id": "123" }, { "name": "Bob", "id": "234" } ] }; const banned = ["123", "423"]; // O(n) startup cost for constant access time later const bannedSet = new Set(banned); // O(n) const result = source.data.filter(x => !bannedSet.has(x.id)); console.log(result); 

As mentioned in the comments, there's a startup cost for creating the Set . 如评论中所述,创建Set产生启动成本。 However, this lets you then call Set.prototype.has , which is constant. 但是,这使您可以随后调用Set.prototype.has ,它是恒定的。

Then, it's just a matter of iterating over every element and filtering out the ones that are in the banned set. 然后,只需遍历每个元素并过滤掉禁区内的元素即可。

If you can't use ES6, you could replace Set with a plain JS object. 如果您不能使用ES6,则可以将Set替换为普通的JS对象。 If you have to support IE<9, use a polyfill for Array.prototype.filter (thanks @nnnnnn). 如果必须支持IE <9,请对Array.prototype.filter使用polyfill(感谢@nnnnnn)。

UPDATE UPDATE

@SpencerWieczorek points out that the ES6 spec seems to indicate that Set.prototype.has iterates. @SpencerWieczorek指出, ES6规范似乎表明Set.prototype.hasSet.prototype.has迭代。 I spoke too soon about the lookup being constant (I was carrying over my experience from other languages). 我过早地谈到查找是恒定的(我从其他语言继承了我的经验)。 Typically, sets will do better than O(n), eg constant or O(log n) depending on the underlying implementation. 通常,根据基础实现,集合的性能将优于O(n),例如常量或O(log n)。 Your mileage may vary, so nnnnnn's answer may be faster in some cases. 您的里程可能会有所不同,因此在某些情况下nnnnnn的回答可能会更快。

Try a few of the solutions here with large amounts of data to confirm. 在此处尝试一些具有大量数据的解决方案以进行确认。

EDIT 编辑

I shied away from using filter or the like because that involves creating a new array. 我回避使用filter之类的方法,因为这涉及创建一个新的数组。 That's actually probably fine for the data sizes we're talking about, but the approach I have below is more efficient. 对于我们正在讨论的数据大小,这实际上可能很好,但是我下面使用的方法更加有效。


On my laptop, this whole program runs in about 0.2 seconds. 在我的笔记本电脑上,整个程序运行约0.2秒。 (It uses 10,000 entries and 100 banned IDs.) (它使用10,000个条目和100个禁止的ID。)

var o = {
    data: []
};

for (var i = 0; i < 10000; i++) {
    o.data.push({
        name: i % 2 === 0 ? 'Jake' : 'Bob', // couldn't think of more names :-)
        id: ''+i // convert to string
    });
}

var banned = {};

for (var i = 0; i < 100; i++) {
    banned[''+(i * 3)] = true; // ban 0, 3, 6, 9, 12, ...
}

for (var i = o.data.length - 1; i >= 0; i--) {
    if (banned[o.data[i].id]) {
        o.data.splice(i, 1);
    }
}

console.log(o);

// { data:
//    [ { name: 'Bob', id: '1' },
//      { name: 'Jake', id: '2' },
//      { name: 'Jake', id: '4' },
//      { name: 'Bob', id: '5' },
//      { name: 'Bob', id: '7' },
//      { name: 'Jake', id: '8' },
//      { name: 'Jake', id: '10' },
//      ...

I am assuming that you have already parsed the JSON data and you have a variable pointing to the array you want to filter. 我假设您已经解析了JSON数据,并且有一个变量指向要过滤的数组。 Also, you have an array with the "banned" IDs. 另外,您还有一个带有“禁止” ID的数组。

var data = [{
        "name": "Jake",
        "id": "123"
    }, {
        "name": "Bob",
        "id": "234"
    }, {
        "name": "Joe",
        "id": "345"
    }];

var banned = ["123", "345"];

The following function wil probably do the best job that can be done in terms of performance: 以下功能可能会在性能方面做得最好:

// Modifies the data array "in place", removing all elements
// whose IDs are found in the "banned" array
function removeBanned(data, banned) {
    // Index the "banned" IDs by writing them as the properties
    // of a JS object for really quick read access later on
    var bannedObj = {};
    banned.forEach(function(b) { bannedObj[b] = true; });

    var index = data.length - 1;

    while (index >= 0) {
        if (bannedObj[data[index].id]) {
            data.splice(index, 1);
        }
        --index;
    }
}

This one seems fast enough, but I'd suggest you make a free clean copy instead of modifying the existing array, - it may be faster. 这似乎足够快,但是我建议您制作一个免费的干净副本,而不要修改现有阵列,这可能会更快。

 function filterout(o,p,f) { var i = 0; f = f.join(); while( o[i] ) { if( f.match( o[i][p] ) ){ o.splice(i,1) } i++ }; } var filter = ["123","423"]; var object = { "data": [ { "name": "John", "id": "723" }, { "name": "Jake", "id": "123" }, { "name": "Bob", "id": "234" }] }; filterout( object.data, "id", filter ); console.log(JSON.stringify( object )); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM