简体   繁体   中英

Filter Array of Arrays to Unique based off 2 values?

I have an array of arrays and need to filter them to have unique based off the first 2 items in the arrays.

[
    ["18", "-66"],
    ["26", "-80"],
    ["26", "-80"],
    ["26", "-80"],
    ["27", "-80"],
    ["27", "-80"],
    ["28", "-97"],
    ["29", "-99"],
    ["30", "-81"],
    ["30", "-87"],
    ["30", "-99"],
    ["31", "-110"],
    ["31", "-110"],
    ["31", "-111"],
    ["31", "-84"],
    ["31", "-87"],
    ["31", "-95"],
    ["31", "-95"],
    ["31", "-95"]
]

While a function such as this does a great job with a single unique, I fail to see how to modify it to search the first two items:

function filterArray(incomingArray){
  var unique = {};
  var distinct = [];
  for( var i in incomingArray ){
   if( typeof(unique[incomingArray[i][0]]) == "undefined"){
    distinct.push([incomingArray[i][0],{}]);
   }
   unique[incomingArray[i][0]] = 0;
  }
  return distinct;
}

The entire array doesn't need to be unique, just the first two items, so for example the following would match:

[
    ["26", "-80", 2],
    ["26", "-80", 3]
]

## Update ##

I tried each suggested method and found some interesting things. First let me say thank you for each of your ideas / ways of solving the problem. In performance testing, I supplied 4000 records and did a performance test on each method.

Mark's method of using Set() with 4000 records finished in 19ms, when supplied only about 50 records it finished in 2ms.

D. Seah's method with with 4000 records finished in 149ms, but when supplied only 50 records finished in under 1ms.

vol7ron's method with 4000 records finished in 30ms, but oddly took 52ms with 50 records

Ben's method I am still working through how it works and so far haven't had it return what I expected, but with it returning a 3 dimensional array, I am interested to see what possibilities there are in other applications, but it took around 38 seconds and I ended up with a very large array of undefined . It is possibly something I am doing wrong, but at least in this setting, it may be more capable than I need, and since the other solutions are fast enough then I may table it till next time.

Using Set() looks like the best way to go for a performance dependent solution where the dataset may keep growing, but in a smaller dataset vol7tron wins.

If you can find a character you know won't be in your data, you can use that to join your first two elements into a key. For example 18_-66 . Then you can keep a Set of known values and use it to filter. Key lookups in Sets are constant time, so this should run linear time in n. For example:

 let arr = [["18", "-66"],["26", "-80"],["26", "-80"],["26", "-80"],["27", "-80"],["27", "-80"],["28", "-97"],["29", "-99"],["30", "-81"],["30", "-87"],["30", "-99"],["31", "-110"],["31", "-110"],["31", "-111"],["31", "-84"],["31", "-87"],["31", "-95"],["31", "-95"],["31", "-95"]] let s = new Set() let f = arr.filter(item => { let key = item[0] + '_' +item[1] // join elements 1 and 2 with _ return !s.has(key) && s.add(key) // add to set and return true if it's not already there }) console.log(f) 

The following function will iterate through the array twice, and if the first two items match, it will push an array containing the two items to an output array.

function filterArray(arr) {
  var matching = []

  for (i in arr) {
    for (j in arr) {
      if (arr[i][0] === arr[j][0] && arr[i][1] === arr[j][1]) {
        matching.push([arr[i], arr[j]])
      }
    }
  }

  for (k in matching) {
    matching[k] = [matching[k][0][0], matching[k][0][1], matching[k].count]
  }

  return matching
}

The easy way to manage is to keep a cache variable of your previously processed list. If while evaluating your array it is already in the cache list, don't store it. If it is not in the cache, then keep it and make an entry in the cache.

 let cache = {}; let data = getData(); let unique = data.reduce((acc,curr)=>{ let key = [curr[0],curr[1]].join(',') if (!cache[key]){ cache[key] = true; acc.push(curr) } return acc; },[]) console.log('unique: ', prettyPrint(unique)) console.log('cache: ', prettyPrint(cache)) function getData(){ return [ ["18", "-66"], ["26", "-80"], ["26", "-80"], ["26", "-80"], ["27", "-80"], ["27", "-80"], ["28", "-97"], ["29", "-99"], ["30", "-81"], ["30", "-87"], ["30", "-99"], ["31", "-110"], ["31", "-110"], ["31", "-111"], ["31", "-84"], ["31", "-87"], ["31", "-95"], ["31", "-95"], ["31", "-95"] ]} function prettyPrint(data){ if(Array.isArray(data)){ return '\\n' + JSON.stringify(data,null,2) .replace(/,\\n\\s+[^[\\s]/g, ',') .replace(/(\\[)\\s+([^[\\s])/g, '$1$2') .replace(/"\\s+/g,'"') } else if (typeof data === 'object') return '\\n' + JSON.stringify(data,null,2) else return data; } 

  • let key = [curr[0],curr[1]].join(',') is part of the magic. It joins the two values by a delimiter so only one lookup is performed. This value is easier to see in the cache output

you can use a reduce function

var oDistinct = oData.reduce(function (acc, cur) {
  if (!acc.some(function(i) {
    return i[0] === cur[0] && i[1] === cur[1];
  })) {
    acc.push(cur);
  }
  return acc;
}, []);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM