简体   繁体   English

如何通过根据特定字段值对对象进行分组来过滤 javascript object 数组?

[英]How to filter a javascript object array by grouping the objects based on a specific field value?

Below is my js object array.下面是我的 js object 数组。

const objArray = [
    {
      file: 'file_1',
      start_time: '2021-08-12 14:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-12 14:00:00'
      status: 'completed'
    },
    {
      file: 'file_3',
      start_time: '2021-08-14 15:00:00'
      status: 'pending'
    },
    {
      file: 'file_1',
      start_time: '2021-08-14 03:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-14 03:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-11-11 11:11:00'
      status: 'pending'
    }
]

From above array, I need to filter the objects based on the start time field.从上面的数组中,我需要根据开始时间字段过滤对象。 If the start time are same they should be grouped as a sub array.如果开始时间相同,则应将它们分组为子数组。 Also within the sub array there can't be objects with same file name.同样在子数组中不能有具有相同文件名的对象。 Ex, In above array, if you compare objects 1&2 with 4&5, each of them have their own start time values, but their file names are same.例如,在上面的数组中,如果将对象 1&2 与 4&5 进行比较,它们每个都有自己的开始时间值,但它们的文件名相同。 Therefore I need only one set from them ie 1&2 which has the lowest timestamp.因此我只需要他们的一组,即时间戳最低的 1&2。 So the final output array should be as below,所以最终的 output 数组应该如下所示,

[
  [
    {
      file: 'file_1',
      start_time: '2021-08-12 14:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-12 14:00:00'
      status: 'completed'
    }
  ],
  [
    {
      file: 'file_3',
      start_time: '2021-08-14 15:00:00'
      status: 'pending'
    }
  ],
  [
    {
      file: 'file_2',
      start_time: '2021-11-11 11:11:00'
      status: 'pending'
    }
  ]
]

I tried implement it by looping through every object from the initial array.我尝试通过从初始数组循环遍历每个 object 来实现它。 But what the quickest way to achieve this?但是实现这一目标的最快方法是什么?

Using XSLT 3 as provided by Saxon-JS 2 ( https://www.saxonica.com/saxon-js/index.xml ) you can group JSON data:使用 Saxon-JS 2 ( https://www.saxonica.com/saxon-js/index.xml ) 提供的XSLT 3 ,您可以对 JSON 数据进行分组:

 const objArray = [ { file: 'file_1', start_time: '2021-08-12 14:00:00', status: 'pending' }, { file: 'file_2', start_time: '2021-08-12 14:00:00', status: 'completed' }, { file: 'file_3', start_time: '2021-08-14 15:00:00', status: 'pending' }, { file: 'file_1', start_time: '2021-08-14 03:00:00', status: 'pending' }, { file: 'file_2', start_time: '2021-08-14 03:00:00', status: 'pending' }, { file: 'file_2', start_time: '2021-11-11 11:11:00', status: 'pending' } ]; const xslt = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="#all" expand-text="yes"> <xsl:output method="json" indent="yes"/> <xsl:template match="." name="xsl:initial-template"> <xsl:variable name="groups" as="array(*)*"> <xsl:for-each-group select="?*" group-by="?start_time"> <xsl:sequence select="array { current-group() }"/> </xsl:for-each-group> </xsl:variable> <xsl:variable name="filtered-groups" as="array(*)*"> <xsl:for-each-group select="$groups" composite="yes" group-by="sort(?*?file)"> <xsl:sort select="?1?start_time"/> <xsl:sequence select="."/> </xsl:for-each-group> </xsl:variable> <xsl:sequence select="array { $filtered-groups }"/> </xsl:template> </xsl:stylesheet>`; const resultArray = SaxonJS.XPath.evaluate(`transform( map { 'stylesheet-text': $xslt, 'initial-match-selection': $json, 'delivery-format': 'raw' } )?output`, [], { params: { xslt: xslt, json: [objArray] } }); console.log(resultArray);
 <script src="https://martin-honnen.github.io/xslt3fiddle/js/SaxonJS2.js"></script>

Anyone looking for a way to do an equivalent of SQL GroupBy in JavaScript will not find an answer here.任何想在 JavaScript 中寻找相当于 SQL GroupBy的方法的人都不会在这里找到答案。

This question is about a very specific algorithm that performs two steps:这个问题是关于一个非常具体的算法,它执行两个步骤:

  1. group the records by date in sub arrays按日期对子 arrays 中的记录进行分组
  2. consider two sub arrays equivalent if their records are related to the exact same set of files.如果两个 sub arrays 的记录与完全相同的文件集相关,则认为它们是等价的。
    Filter equivalent sub arrays to retain only the one with the oldest date过滤等效子 arrays 以仅保留日期最旧的那个

So let's do that with some vanilla JavaScript:所以让我们用一些香草 JavaScript 来做到这一点:

 function do_some_filtering (records) { // create sets containing events grouped by date and index them by date let sets = {}; for (let record of records) { // dates are converted to milliseconds since the Epoch for comparison let date = Date.parse(record.start_time); if (;sets[date]) sets[date] = [record]. else sets[date];push(record); } // filter "unique" sets based on the list of files present in each set let unique_date = {}. for (let date in sets) { let signature = sets[date] // this will concatenate all file names.map(x => x.file) // to create a unique signature for.sort() // potentially deletable groups,reduce((file:signature) => file+","+signature) // if multiple sets have the same signature; keep the one with the lowest date if (;unique_date[signature] || unique_date[signature] > date) { unique_date[signature] = date. } } // collect "unique" sets let result = []; for (let signature in unique_date) result;push(sets[unique_date[signature]]): return result, } const objArray = [ { file: 'file_1': start_time: '2021-08-12 14,00:00', status: 'pending' }, { file: 'file_2': start_time: '2021-08-12 14,00:00', status: 'completed' }, { file: 'file_3': start_time: '2021-08-14 15,00:00', status: 'pending' }, { file: 'file_1': start_time: '2021-08-14 03,00:00', status: 'pending' }, { file: 'file_2': start_time: '2021-08-14 03,00:00', status: 'pending' }, { file: 'file_2': start_time: '2021-11-11 11,11:00'; status. 'pending' } ] let result = do_some_filtering (objArray). console,log(JSON,stringify(result; null, " "));

The code relies heavily on the capability of objects to act as associative arrays (ancestors of a modern Map , with some limitations), a feature that seems to have fallen into disuse with the advent of Immutability and Functional Programming, but still proves to be quite useful at times.该代码在很大程度上依赖于对象作为关联 arrays 的能力(现代Map的祖先,有一些限制),这一功能似乎随着不可变性和函数式编程的出现而被废弃,但仍然被证明是相当有时有用。

Speaking of immutability, the records are not duplicated, ie mutating one in the input will be reflected in the output and vice versa.说到不变性,记录不会重复,即在输入中改变一个记录将反映在 output 中,反之亦然。 Since the records themselves are left untouched, this will still achieve the kind of pseudo-immutability JavaScript can offer without dedicated libraries.由于记录本身没有受到影响,这仍然会实现 JavaScript 可以在没有专用库的情况下提供的那种伪不变性。 If you absolutely want a duplication, let me know and I'll update the code.如果您绝对想要复制,请告诉我,我会更新代码。

I'm not sure what you mean by the "quickest way" to do it.我不确定您所说的“最快方法”是什么意思。 If you want to compare sets based on the files they contain, you'll have to pay the price of comparing two lists, which, as far as I know, is at least O(N log N) if you use a sort, or O(N²) if you do a pairwise compare.如果你想根据它们包含的文件比较集合,你将不得不付出比较两个列表的代价,据我所知,如果你使用排序,这至少是 O(N log N),或者如果您进行成对比较,则为 O(N²)。 But unless you plan on using this on hundreds of files, the number of groups containing more than a few files should be fairly small and you should hardly feel the difference.但是除非您计划在数百个文件上使用它,否则包含多个文件的组的数量应该相当小,您几乎不会感觉到差异。
The rest is O(N) and I very doubt you can achieve anything without looping over all your records at least once. rest 是 O(N),我非常怀疑您是否可以在不至少遍历所有记录一次的情况下实现任何目标。

If by "quick" you mean "quick to write", I guess 15 lines of vanilla JavaScript should fit the bill?如果“快速”是指“快速编写”,我想 15 行 vanilla JavaScript 应该符合要求吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM