使用 groupby 和 agg 后应用聚合 function

Question

我试图多次聚合我的数据集，但我似乎无法找到使用pandas进行聚合的正确方法。 给定这样的数据集：

donations = [
  {
    "amount": 100,
    "organization": {
      "name": "Org 1",
      "total_budget": 8000,
      "states": [
        {
          "name": "Maine",
          "code": "ME"
        },
        {
          "name": "Massachusetts",
          "code": "MA"
        }
      ]
    }
  },
  {
    "amount": 5000,
    "organization": {
      "name": "Org 2",
      "total_budget": 10000,
      "states": [
        {
          "name": "Massachusetts",
          "code": "MA"
        }
      ]
    }
  },
  {
    "amount": 5000,
    "organization": {
      "name": "Org 1",
      "total_budget": 8000,
      "states": [
        {
          "name": "Maine",
          "code": "ME"
        },
        {
          "name": "Massachusetts",
          "code": "MA"
        }
      ]
    }
  }
]

我想要的 output 是total_budget和amount列的 state 的单个聚合。 我已经非常接近以下内容：

n = pd.json_normalize(donations, record_path=['organization', 'states'], meta=['amount', ['organization', 'total_budget'], ['organization', 'name']], record_prefix='states.')
df = pd.DataFrame(n)
grouped_df = df.groupby(['states.code', 'states.name', 'organization.name', 'organization.total_budget']).sum()

尽管这给我的是 state 的细分，但仍包括组织名称：

MA          Massachusetts Org 1             8000                         5100
                          Org 2             10000                        5000
ME          Maine         Org 1             8000                         5100

我知道我需要以相同的方式保持我的初始聚合 function 以产生正确的结果，但我不确定最后一步是什么以获得我的预期结果，然后将这些结果按 state 分组：

MA          Massachusetts     18000              10100
ME          Maine             8000               5100

Answer 1

我不知道这是否适用于您的实际数据。 您作为示例数据限制创建的方法将数据框除以您要聚合的值并删除重复的行。 然后，它将两个数据帧分组、聚合和组合在一起。

df_a = df[['states.code', 'states.name', 'organization.name', 'amount']]
df_o = df[['states.code', 'states.name', 'organization.name', 'organization.total_budget']]
df = df_a.groupby(['states.code', 'states.name'])['amount'].sum().reset_index()
df_o.drop_duplicates(inplace=True)
df1 = df_o.groupby(['states.code', 'states.name'])['organization.total_budget'].sum().reset_index()
df1.merge(df, on=['states.code', 'states.name'], how='inner')
    states.code states.name organization.total_budget   amount
0   MA  Massachusetts   18000   10100
1   ME  Maine   8000    5100

使用 groupby 和 agg 后应用聚合 function

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-09-22 03:51:57

使用 groupby 和 agg 后应用聚合 function

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-09-22 03:51:57

解决方案1
0 已采纳 2020-09-22 03:51:57