简体   繁体   English

嵌套 for 循环基于 pandas 中的 2 级多索引创建多个 pivot 表

[英]Nested for loops to create muliple pivot table based on 2 level multiindex in pandas

Started getting confused with this one.开始对这个感到困惑。 I have a large Fact Invoice Header table.我有一个大的 Fact Invoice Header 表。 I took the original dataframe, used a groupby to split the df up based upon one column.我拿了原来的 dataframe,使用 groupby 根据一列拆分 df。 The output was a list of dataframes: output 是数据帧列表:

list_of_dfs = []
for _, g in df.groupby(df['Project State Name']):
    list_of_dfs.append(g)
list_of_dfs

Then I used a another for loop to loop through the list of dataframes and perform one pivot table aggregation.然后我使用另一个 for 循环遍历数据帧列表并执行一个 pivot 表聚合。

for each_state_df in list_of_dfs:
    columns_to_index_by = ['Project Issue', 'Project Secondary Issue', 'Project Client Name']
# Aggregating to the Project Level
    table_for_pivots = pd.pivot_table(df, index=['FY Year', 'Project Issue'], values=["Project Key", 'Total Net Amount', "Project Total Resolution Amount", 'Project Budgeted Amount'],
                             aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,  "Project Total Resolution Amount": np.mean,
                                       'Project Budgeted Amount': np.mean},
                             fill_value=np.mean)
print(table_for_pivots)

My question is, how can I use another for loop replace the second element in the pivot table index with each value in the variable columns_to_index_by?我的问题是,如何使用另一个 for 循环将 pivot 表索引中的第二个元素替换为变量 columns_to_index_by 中的每个值? The output would be 3 pivot tables where index=['FY Year', 'Project Issue'], index=['FY Year', 'Project Secondary Issue', and index=['FY Year', 'Project Client Name']. output 将是 3 个 pivot 表,其中 index=['FY Year', 'Project Issue'], index=['FY Year', 'Project Secondary Issue', index=['FY Year', 'Project Client Name' ]. Thanks all!谢谢大家!

Link to download a sample df data is here:下载样本 df 数据的链接在这里:

https://ufile.io/iufv9nma https://ufile.io/iufv9nma

Use list comprehension and iterate through a zip of the index you want to set for each group:使用列表理解并遍历要为每个组设置的索引的zip

from pandas import Timestamp
from numpy import nan
d = {'Total Net Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Project Total Resolution Amount': {2: 35000, 41: 27000, 17: 40000, 32: 27000, 26: 27000, 11: 40000, 23: 27000, 35: 27000, 29: 27000, 59: 27000, 38: 27000, 0: 30000}, 'Invoice Header Key': {2: 1229422, 41: 984803, 17: 1270731, 32: 938069, 26: 911535, 11: 1247443, 23: 902150, 35: 943737, 29: 918888, 59: 1071541, 38: 965091, 0: 1279581}, 'Project Key': {2: 259661, 41: 194517, 17: 259188, 32: 194517, 26: 194517, 11: 259188, 23: 194517, 35: 194517, 29: 194517, 59: 194517, 38: 194517, 0: 263736}, 'Project Secondary Issue': {2: 2, 41: 4, 17: 0, 32: 3, 26: 3, 11: 0, 23: 4, 35: 4, 29: 4, 59: 4, 38: 3, 0: 4}, 'Organization Key': {2: 16029, 41: 22638, 17: 24230, 32: 22638, 26: 22638, 11: 24230, 23: 22638, 35: 22638, 29: 22638, 59: 22638, 38: 22638, 0: 4532}, 'Project Budgeted Amount': {2: 42735.0, 41: 32500.0, 17: 26000.0, 32: 32500.0, 26: 32500.0, 11: 26000.0, 23: 32500.0, 35: 32500.0, 29: 32500.0, 59: 32500.0, 38: 32500.0, 0: nan}, 'Project State Name': {2: 0, 41: 1, 17: 2, 32: 1, 26: 1, 11: 2, 23: 1, 35: 1, 29: 1, 59: 1, 38: 1, 0: 1}, 'Project Issue': {2: 0, 41: 2, 17: 1, 32: 2, 26: 2, 11: 1, 23: 2, 35: 2, 29: 2, 59: 2, 38: 2, 0: 1}, 'Project Number': {2: 2, 41: 0, 17: 1, 32: 0, 26: 0, 11: 1, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 3}, 'Project Client Name': {2: 1, 41: 0, 17: 0, 32: 0, 26: 0, 11: 0, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 1}, 'Paid Date Year Month': {2: 13, 41: 7, 17: 15, 32: 4, 26: 2, 11: 14, 23: 1, 35: 5, 29: 3, 59: 12, 38: 6, 0: 16}, 'FY Year': {2: 2, 41: 0, 17: 2, 32: 0, 26: 0, 11: 2, 23: 0, 35: 0, 29: 0, 59: 1, 38: 0, 0: 2}, 'Invoice Paid Date': {2: Timestamp('2019-09-10 00:00:00'), 41: Timestamp('2017-12-20 00:00:00'), 17: Timestamp('2019-11-25 00:00:00'), 32: Timestamp('2017-08-31 00:00:00'), 26: Timestamp('2017-06-14 00:00:00'), 11: Timestamp('2019-10-08 00:00:00'), 23: Timestamp('2017-05-30 00:00:00'), 35: Timestamp('2017-09-07 00:00:00'), 29: Timestamp('2017-07-10 00:00:00'), 59: Timestamp('2018-10-03 00:00:00'), 38: Timestamp('2017-11-03 00:00:00'), 0: Timestamp('2019-12-12 00:00:00')}, 'Invoice Paid Date Key': {2: 20190910, 41: 20171220, 17: 20191125, 32: 20170831, 26: 20170614, 11: 20191008, 23: 20170530, 35: 20170907, 29: 20170710, 59: 20181003, 38: 20171103, 0: 20191212}, 'Count Project Secondary Issue': {2: 3, 41: 3, 17: 3, 32: 3, 26: 3, 11: 3, 23: 3, 35: 3, 29: 3, 59: 3, 38: 3, 0: 2}, 'Total Net Amount By Count Project Secondary Issue': {2: 224.0, 41: 425.9666666666667, 17: 90.0, 32: 281.7666666666667, 26: 276.2066666666666, 11: 244.5, 23: 580.6, 35: 84.71551666666666, 29: 81.66666666666667, 59: 71.66666666666667, 38: 205.8, 0: 530.75}, 'Total Net Invoice Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Total Project Invoice Amount': {2: 7176.52, 41: 10110.98655, 17: 1678.5, 32: 10110.98655, 26: 10110.98655, 11: 1678.5, 23: 10110.98655, 35: 10110.98655, 29: 10110.98655, 59: 10110.98655, 38: 10110.98655, 0: 1061.5}, 'Invoice Dollar Percent of Project': {2: 0.09363869953682286, 41: 0.1263872712796755, 17: 0.160857908847185, 32: 0.08360212881501655, 26: 0.08195243816242638, 11: 0.4369973190348526, 23: 0.1722680562758735, 35: 0.02513568272919916, 29: 0.02423106773888449, 59: 0.02126399821983741, 38: 0.06106229070198891, 0: 1.0}}
df = pd.DataFrame(d)

# list comprehension with groupby
group = [g for _, g in df.groupby('Project State Name')]

#create a list of indices you want to use in pivot
idx = [['FY Year', 'Project Issue'],
       ['FY Year', 'Project Secondary Issue'],
       ['FY Year', 'Project Client Name']]

# create a list of columns to add to the value param in pivot
values = ["Project Key", 'Total Net Amount',
          "Project Total Resolution Amount", 'Project Budgeted Amount']

# use your current pivot and iterate through zip(idx, group)
dfs = [pd.pivot_table(df, index=i, values=values,
                      aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
                                "Project Total Resolution Amount": np.mean,
                                'Project Budgeted Amount': np.mean},
                                 fill_value=np.mean) for i,df in zip(idx, group)]

dict comprehension听写理解

I did not know what you wanted the key to be so I just selected the second value from idx.我不知道你想要的密钥是什么,所以我只是从 idx 中选择了第二个值。 You will call each dataframe from the dict by dfs['Project Issue']您将通过dfs['Project Issue']从字典中调用每个 dataframe

dfs = {i[1]: pd.pivot_table(df, index=i, values=values,
                      aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
                                "Project Total Resolution Amount": np.mean,
                                'Project Budgeted Amount': np.mean},
                                 fill_value=np.mean) for i,df in zip(idx, group)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM