[英]Preserving multindex column structure after performing a groupby summation
[英]Preserving columns in output after performing sum on groupby
給出樣本df
:
df = pd.DataFrame([['William', 1, 0, 'T', 0, 1],['James', 0, 1, 'R', 1, 1],['James', 1, 0, 'S', 0, 1],['Dean', 1, 0, 'R', 1, 0],['William', 0, 1, 'S', 0, 0],['James', 0, 0, 'S', 0, 1]],columns=['Name','x1','x2','x3','x4','x5'])
Name x1 x2 x3 x4 x5
0 William 1 0 T 0 1
1 James 0 1 R 1 1
2 James 1 0 S 0 1
3 Dean 1 0 R 1 0
4 William 0 1 S 0 0
5 James 0 0 S 0 1
之前我曾問過如何將各種過濾器應用於此df
並輸出應用於groupby
每個組對象的一系列函數的結果,我得出了以下解決方案:
variables = {'x1': 'sum','x2': 'sum','x4': 'sum','x5': 'sum'}
filters = {'Option1': df['x3']=='S', 'Option2': df['x3']=='R', 'Option3': (df['x2']==1) | (df['x4']==1) | (df['x5']==1), 'Option4': df['x2']==1, 'Option5': df['x2']==0, 'Option6': df['x5']==1}
out = {key: df[f].groupby('Name').agg(variables) for key, f in filters.items()}
out = pd.concat(results)
連接結果后,我留下以下內容:
x1 x2 x4 x5
Name
Option1 James 1 0 0 2
William 0 1 0 0
Option2 Dean 1 0 1 0
James 0 1 1 1
Option3 Dean 1 0 1 0
James 1 1 1 3
William 1 1 0 1
Option4 James 0 1 1 1
William 0 1 0 0
Option5 Dean 1 0 1 0
James 1 0 0 2
William 1 0 0 1
Option6 James 1 1 1 3
William 1 0 0 1
我想再次groupby('Name')
,它給了我:
x1 x2 x4 x5
Name
Option2 Dean 1 0 1 0
Option3 Dean 1 0 1 0
Option5 Dean 1 0 1 0
x1 x2 x4 x5
Name
Option1 James 1 0 0 2
Option2 James 0 1 1 1
Option3 James 1 1 1 3
Option4 James 0 1 1 1
Option5 James 1 0 0 2
Option6 James 1 1 1 3
x1 x2 x4 x5
Name
Option1 William 0 1 0 0
Option3 William 1 1 0 1
Option4 William 0 1 0 0
Option5 William 1 0 0 1
Option6 William 1 0 0 1
但是我有從結果中遺漏的列(或行,取決於你如何看待它)(例如,過濾器df['x3']=='S'
將使Name
列沒有'Dean'
實例'Dean'
)。 我覺得我離這里很近,但這是我想要的輸出(名稱的排序不相關):
x1 x2 x4 x5
Name
James Option1 1 0 0 2
Option2 0 1 1 1
Option3 1 1 1 3
Option4 0 1 1 1
Option5 1 0 0 2
Option6 1 1 1 3
Dean Option1 0 0 0 0
Option2 1 0 1 0
Option3 1 0 1 0
Option4 0 0 0 0
Option5 1 0 1 0
Option6 0 0 0 0
William Option1 0 1 0 0
Option2 0 0 0 0
Option3 1 1 0 1
Option4 0 1 0 0
Option5 1 0 0 1
Option6 1 0 0 1
謝謝你的任何指示。
您可以通過重新索引out
DataFrame並交換索引級別來完成所需的操作。 從連接的結果開始:
from itertools import product
# Swap the index levels
out = out.swaplevel(0,1)
# Form the product of the two index levels
ids = list(product(out.index.get_level_values(0).unique(),
out.index.get_level_values(1).unique()))
# Reindex out, filling missing with 0 and sorting the index
out = out.reindex(ids).fillna(0).sort_index().astype('int')
out
現:
x1 x2 x4 x5
Name
Dean Option1 0 0 0 0
Option2 1 0 1 0
Option3 1 0 1 0
Option4 0 0 0 0
Option5 1 0 1 0
Option6 0 0 0 0
James Option1 1 0 0 2
Option2 0 1 1 1
Option3 1 1 1 3
Option4 0 1 1 1
Option5 1 0 0 2
Option6 1 1 1 3
William Option1 0 1 0 0
Option2 0 0 0 0
Option3 1 1 0 1
Option4 0 1 0 0
Option5 1 0 0 1
Option6 1 0 0 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.