简体   繁体   English

使用 groupby 和 agg 后应用聚合 function

[英]Applying aggregate function after using groupby and agg

I am trying to aggregate my dataset multiple times and I can't seem to figure out the right way to do so with pandas .我试图多次聚合我的数据集,但我似乎无法找到使用pandas进行聚合的正确方法。 Given a dataset like so:给定这样的数据集:

donations = [
  {
    "amount": 100,
    "organization": {
      "name": "Org 1",
      "total_budget": 8000,
      "states": [
        {
          "name": "Maine",
          "code": "ME"
        },
        {
          "name": "Massachusetts",
          "code": "MA"
        }
      ]
    }
  },
  {
    "amount": 5000,
    "organization": {
      "name": "Org 2",
      "total_budget": 10000,
      "states": [
        {
          "name": "Massachusetts",
          "code": "MA"
        }
      ]
    }
  },
  {
    "amount": 5000,
    "organization": {
      "name": "Org 1",
      "total_budget": 8000,
      "states": [
        {
          "name": "Maine",
          "code": "ME"
        },
        {
          "name": "Massachusetts",
          "code": "MA"
        }
      ]
    }
  }
]

My desired output is a single aggregation by state of the total_budget and amount columns.我想要的 output 是total_budgetamount列的 state 的单个聚合。 I have gotten pretty close with the following:我已经非常接近以下内容:

n = pd.json_normalize(donations, record_path=['organization', 'states'], meta=['amount', ['organization', 'total_budget'], ['organization', 'name']], record_prefix='states.')
df = pd.DataFrame(n)
grouped_df = df.groupby(['states.code', 'states.name', 'organization.name', 'organization.total_budget']).sum()

Though what this gives me is a breakdown by state, with the organization names still included:尽管这给我的是 state 的细分,但仍包括组织名称:

MA          Massachusetts Org 1             8000                         5100
                          Org 2             10000                        5000
ME          Maine         Org 1             8000                         5100

I know that I need to keep my initial aggregate function the same way in order to produce the correct results, but I am not sure what the final step is to get my expected results that then group these results by state:我知道我需要以相同的方式保持我的初始聚合 function 以产生正确的结果,但我不确定最后一步是什么以获得我的预期结果,然后将这些结果按 state 分组:

MA          Massachusetts     18000              10100
ME          Maine             8000               5100

I don't know if this applies to your actual data or not.我不知道这是否适用于您的实际数据。 The approach you created as a sample data limitation divides the data frame by the values you want to aggregate and removes the duplicate rows.您作为示例数据限制创建的方法将数据框除以您要聚合的值并删除重复的行。 It then groups and aggregates and combines the two data frames together.然后,它将两个数据帧分组、聚合和组合在一起。

df_a = df[['states.code', 'states.name', 'organization.name', 'amount']]
df_o = df[['states.code', 'states.name', 'organization.name', 'organization.total_budget']]
df = df_a.groupby(['states.code', 'states.name'])['amount'].sum().reset_index()
df_o.drop_duplicates(inplace=True)
df1 = df_o.groupby(['states.code', 'states.name'])['organization.total_budget'].sum().reset_index()
df1.merge(df, on=['states.code', 'states.name'], how='inner')
    states.code states.name organization.total_budget   amount
0   MA  Massachusetts   18000   10100
1   ME  Maine   8000    5100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM