简体   繁体   English

针对大型数据集按数组将分组优化为嵌套结构

[英]Optimize group by array into nested structure for large data sets

There is given "flat" array with fields (status, type, etc.) which could be dynamic (more or less key/value pairs), like: 给定带有字段(状态,类型等)的“平面”数组,这些字段可以是动态的(或多或少的键/值对),例如:

$data = array(
    array(
        "status" => "new",
        "type" => "type1",
        "source" => "source1",
        "other" => "other1",
        "count" => "1",
    ),
    ...

Objective is to get multidimensional/nested array "grouped" by different count of grouping fields. 目的是获得通过不同数量的分组字段“分组”的多维/嵌套数组。 For example, if is needed to group by 4 fields: 例如,如果需要按4个字段分组:

$groups = array("status", "type", "source", "other");

If no children, then "data" key should have all "raw" data, if have children, then group field and value, like in demo and in this image 如果没有子代,则“数据”键应具有所有“原始”数据,如果有子代,则将字段和值分组,如演示和此图中所示 儿童/数据

Resulting data set should be as follows: 结果数据集应如下:

Array
(
    [0] => Array
        (
            [fieldName] => status
            [value] => new
            [children] => Array
                (
                    [0] => Array
                        (
                            [fieldName] => type
                            [value] => type1
                            [children] => Array
                                (
                                    [0] => Array
                                        (
                                            [fieldName] => source
                                            [value] => source1
                                            [children] => Array
                                                (
                                                    [0] => Array
                                                        (
                                                            [fieldName] => other
                                                            [value] => other1
                                                            [data] => Array
                                                                (
                                                                    [0] => Array
                                                                        (
                                                                            [status] => new
                                                                            [type] => type1
                                                                            [source] => source1
                                                                            [other] => other1
                                                                            [count] => 1
                                                                        )

I adapted solution from ( rearrange a php array into a nested hierarchical array ) but it's quite messy and it takes large amount of memory and time. 我将解决方案从( 将php数组重新排列为嵌套的层次结构数组 )改编而成,但是它很凌乱,并且占用大量内存和时间。 Could it be optimized for large datasets (10000 and more "flat" array records), improved performance and beautified code? 是否可以针对大型数据集(10000个及更多“平面”数组记录)进行优化,提高性能并美化代码?

This will be used to calculate each group subtotals (sum, count, averages, etc.). 这将用于计算每个组的小计(总和,计数,平均值等)。

Demo 演示版

It is a pity that you don't explain what this is going to be used for, but that's a common problem with Stack Overflow questions. 遗憾的是您没有解释这将用于什么目的,但这是Stack Overflow问题的常见问题。 The essence of the problem is often missing, so it becomes an abstract exercise. 问题的本质经常缺失,因此成为抽象的练习。

For instance, I don't see the point of rearranging the array in this specific way. 例如,我看不到以这种特定方式重新排列数组的意义。 I think the resulting array could use the array keys more efficiently. 我认为结果数组可以更有效地使用数组键。 There's also a lot of repetition of information. 信息也有很多重复。

But this is what we got, so without further complaining from my side, here is the code I came up with: 但这就是我们得到的,因此无需我进一步抱怨,这是我想出的代码:

function rearrangeItems($flatItems, $groups)
{
    $groupedItems = [];
    $groupName    = array_shift($groups);
    $groupValues  = array_unique(array_column($flatItems, $groupName));
    foreach ($groupValues as $groupValue) {
        $children = [];
        foreach ($flatItems as $flatItem) {
            if ($flatItem[$groupName] == $groupValue) {
                $children[] = $flatItem;
            }    
        }    
        if (count($groups) > 0) {
            $children = rearrange($children, $groups);
            $groupKey = "children";
        }
        else {
            $groupKey = "data";
        }
        $groupedItems[] = ["fieldName" => $groupName, 
                           "value"     => $groupValue,
                           $groupKey   => $children];
    }    
    return $groupedItems;
}

Yes, this is all that is needed. 是的,这就是所需要的。 It results in the same output. 结果相同。

This function is recursive , it does one level of grouping and then hands the result over to the next level, until there are no more levels. 此函数是递归的 ,它进行分组的一个级别,然后将结果移交给下一个级别,直到没有更多级别为止。 The complex bit is: 复杂的位是:

array_unique(array_column($flatItems, $groupName))

It returns all the different values at the current level of grouping. 它返回当前分组级别的所有不同值。

This is not the absolute most efficient algorithm, but it is understandable. 这不是绝对最有效的算法,但可以理解。 If I tried to make it more efficient, readability would probably suffer, and that is never a good thing. 如果我试图使其更高效,则可读性可能会受到影响,这绝不是一件好事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM