简体   繁体   中英

Optimize group by array into nested structure for large data sets

There is given "flat" array with fields (status, type, etc.) which could be dynamic (more or less key/value pairs), like:

$data = array(
    array(
        "status" => "new",
        "type" => "type1",
        "source" => "source1",
        "other" => "other1",
        "count" => "1",
    ),
    ...

Objective is to get multidimensional/nested array "grouped" by different count of grouping fields. For example, if is needed to group by 4 fields:

$groups = array("status", "type", "source", "other");

If no children, then "data" key should have all "raw" data, if have children, then group field and value, like in demo and in this image 儿童/数据

Resulting data set should be as follows:

Array
(
    [0] => Array
        (
            [fieldName] => status
            [value] => new
            [children] => Array
                (
                    [0] => Array
                        (
                            [fieldName] => type
                            [value] => type1
                            [children] => Array
                                (
                                    [0] => Array
                                        (
                                            [fieldName] => source
                                            [value] => source1
                                            [children] => Array
                                                (
                                                    [0] => Array
                                                        (
                                                            [fieldName] => other
                                                            [value] => other1
                                                            [data] => Array
                                                                (
                                                                    [0] => Array
                                                                        (
                                                                            [status] => new
                                                                            [type] => type1
                                                                            [source] => source1
                                                                            [other] => other1
                                                                            [count] => 1
                                                                        )

I adapted solution from ( rearrange a php array into a nested hierarchical array ) but it's quite messy and it takes large amount of memory and time. Could it be optimized for large datasets (10000 and more "flat" array records), improved performance and beautified code?

This will be used to calculate each group subtotals (sum, count, averages, etc.).

Demo

It is a pity that you don't explain what this is going to be used for, but that's a common problem with Stack Overflow questions. The essence of the problem is often missing, so it becomes an abstract exercise.

For instance, I don't see the point of rearranging the array in this specific way. I think the resulting array could use the array keys more efficiently. There's also a lot of repetition of information.

But this is what we got, so without further complaining from my side, here is the code I came up with:

function rearrangeItems($flatItems, $groups)
{
    $groupedItems = [];
    $groupName    = array_shift($groups);
    $groupValues  = array_unique(array_column($flatItems, $groupName));
    foreach ($groupValues as $groupValue) {
        $children = [];
        foreach ($flatItems as $flatItem) {
            if ($flatItem[$groupName] == $groupValue) {
                $children[] = $flatItem;
            }    
        }    
        if (count($groups) > 0) {
            $children = rearrange($children, $groups);
            $groupKey = "children";
        }
        else {
            $groupKey = "data";
        }
        $groupedItems[] = ["fieldName" => $groupName, 
                           "value"     => $groupValue,
                           $groupKey   => $children];
    }    
    return $groupedItems;
}

Yes, this is all that is needed. It results in the same output.

This function is recursive , it does one level of grouping and then hands the result over to the next level, until there are no more levels. The complex bit is:

array_unique(array_column($flatItems, $groupName))

It returns all the different values at the current level of grouping.

This is not the absolute most efficient algorithm, but it is understandable. If I tried to make it more efficient, readability would probably suffer, and that is never a good thing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM