简体   繁体   English

如何对嵌套字典列表中的值进行排序、分组和聚合?

[英]How to sort, group, and aggregate values in a list of nested dictionaries?

given the list of dictionaries below, I want to do the following things:鉴于下面的字典列表,我想做以下事情:

1: Sort the following data by key (top level)'name' 1:按key(顶级)'name'对以下数据进行排序
2: Sort the by the nested key "name" under key "items" 2:按“items”键下的嵌套键“name”排序
3: Group values under items by aggregation interval for example "1d" 3:按聚合间隔对项目下的值进行分组,例如“1d”
4: Get again the min max and avg result from step number 3\ 4:再次从第 3 步获取 min max 和 avg 结果\

Atm, I resolve this by iter down to the values and group them with pandas, aggregate again min max and avg from result. Atm,我通过迭代到值来解决这个问题,并将它们与 pandas 分组,再次从结果中聚合 min max 和 avg。 This way feels really tricky, and the performance is not given.这种方式感觉真的很棘手,性能不给。

Can someone help me out?有人可以帮我吗?

[
    {
        '_id': 2,
        'name': 'b',
        'device': 'b',
        'items': [
            {
                'item_id': 'item_id_2', 'name': 'item_2', 'unit': 'b/s',
                'values': [
                    {'time': datetime.datetime(2022, 9, 5, 15, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 16, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 17, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 18, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 19, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 20, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                ]
            }
        ]
    },
    {
        '_id': 1,
        'name': 'a',
        'device': 'a',
        'items': [
            {
                'item_id': 'item_id_1', 'name': 'item_1', 'unit': 'b/s',
                'values': [
                    {'time': datetime.datetime(2022, 9, 5, 15, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 16, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 17, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 18, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 19, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                    {'time': datetime.datetime(2022, 9, 5, 20, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                ]
            }
        ]
    }
]

As for the result, I would expect something like this:至于结果,我希望是这样的:

[
    {
        '_id': 1,
        'name': 'a',
        'device': 'a',
        'items': [
            {
                'item_id': 'item_id_1', 'name': 'item_1', 'unit': 'b/s',
                'values': [
                    {'time': datetime.datetime(2022, 9, 5, 0, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                ]
            }
        ]
    },
    {
        '_id': 1,
        'name': 'b',
        'device': 'b',
        'items': [
            {
                'item_id': 'item_id_2', 'name': 'item_2', 'unit': 'b/s',
                'values': [
                    {'time': datetime.datetime(2022, 9, 5, 0, 0), 'min': 0.0, 'max': 1.0, 'avg': 0.5},
                ]
            }
        ]
    }
]

With the initial list of dicts that you provided and that I choose to call data , here is one way to do it:使用您提供的初始字典列表并且我选择调用data ,这是一种方法:

df = pd.DataFrame(data)

# First, sort values
df = df.assign(temp=df["items"].apply(lambda x: x[0]["name"])).pipe(
    lambda df_: df_.sort_values(by="temp").drop(columns="temp").reset_index(drop=True)
)

# Get aggregated as new column 'temp'
dfs = df["items"].apply(lambda x: pd.DataFrame(x[0].pop("values", None)))
df["temp"] = pd.Series(
    [
        {
            k: v[0]
            for k, v in df.set_index("time")
            .resample("D")
            .mean()
            .reset_index()
            .to_dict(orient="list")
            .items()
        }
        for df in dfs
    ]
)
df["items"] = df["items"].apply(lambda x: x[0])

# Merge intermediate dictionaries
df["items"] = df.apply(lambda x: x["items"] | {"values": [x["temp"]]}, axis=1)
df = df.drop(columns="temp")

And so:所以:

print(df.to_json(orient="records"))
# Output
[
    {
        "_id": 1,
        "name": "a",
        "device": "a",
        "items": {
            "item_id": "item_id_1",
            "name": "item_1",
            "unit": "b\\/s",
            "values": [{"time": 1662336000000, "min": 0.0, "max": 1.0, "avg": 0.5}],
        },
    },
    {
        "_id": 2,
        "name": "b",
        "device": "b",
        "items": {
            "item_id": "item_id_2",
            "name": "item_2",
            "unit": "b\\/s",
            "values": [{"time": 1662336000000, "min": 0.0, "max": 1.0, "avg": 0.5}],
        },
    },
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM