[英]How to append a "Total" row to pandas dataframe with MultiIndex
[英]Pandas percentage of total row within multiindex
我有一個數據框,如下所示:
df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value'])
print df
Default Letter Color Value Value2
0 Foo A Green 10 20
1 Foo A Red 20 30
2 Foo A Total 50 60
3 Foo B Blue 5 10
4 Foo B Red 15 25
5 Foo B Total 40 100
6 Foo C Orange 25 8
7 Foo C Total 50 10
我需要找到每種顏色在每組中占總行的百分比
我首先想到的是將它們拆分為單獨的索引,並使用.div,但是在這種情況下,我有一個多索引(我在我的示例中首先說的是Foo,但這並不是真實數據的樣子-隨其滾動。)我收到notImplemented錯誤。
df_color = df[df['Color']!='Total'].set_index(['Default','Letter','Color'])
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
df_out = df_color.div(df_tot)
NotImplementedError Traceback (most recent call last)
<ipython-input-119-0caf0e2959a6> in <module>()
4 df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
5
----> 6 df_out = df_color.div(df_tot)
7 #df.set_index(['Default','Letter','Color'],inplace = True)...
這是我想要的輸出:
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])
print df_out
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])
編輯注意,實際上有多個值列-為簡單起見,我僅在此處顯示一個,但解決方案需要處理50-100個數值列。
您可以使用groupby
進行此操作。 查看有關使用groupby 的教程 。
注意 :此實現假定每種顏色的Total
條目是該顏色的最后一個條目(如示例中所示),但這很容易修改。
cols = [x for x in df.columns if x not in ['Default', 'Letter', 'Color']] # or df.columns[3:]
df.loc[:, cols] = df.groupby('Letter', group_keys=False).apply(lambda df: df[cols] / df[cols].iloc[-1])
df[~(df['Color'] == 'Total')]
退貨
Default Letter Color Value Value2
0 Foo A Green 0.200 0.333333
1 Foo A Red 0.400 0.500000
3 Foo B Blue 0.125 0.100000
4 Foo B Red 0.375 0.250000
6 Foo C Orange 0.500 0.800000
我最終使用了melt函數重新格式化了數據幀,因此列名成為了數據中的另一列。 然后我可以簡單地合並和划分,最后重新格式化
df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value','Value2'])
df_color = df[df['Color']!='Total']
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1)
df_melt = pd.melt(df_color,id_vars = ['Default','Letter', 'Color'],var_name =['value_field'] )
df_tot_melt = pd.melt(df_tot,id_vars = ['Default','Letter'],var_name =['value_field'], value_name = 'Total')
df_melt_pct = pd.merge(df_melt, df_tot_melt, how = 'outer', on = ['Default','Letter','value_field'])
df_melt_pct['Pct'] = df_melt_pct['value'] /df_melt_pct['Total']
df_melt_pct = df_melt_pct.drop(['value','Total'],axis = 1).set_index(['Default','Letter','Color','value_field']).unstack()
df_melt_pct.columns = df_melt_pct.columns.droplevel(0)
print df_melt_pct
value_field Value Value2
Default Letter Color
Foo A Green 0.200 0.333333
Red 0.400 0.500000
B Blue 0.125 0.100000
Red 0.375 0.250000
C Orange 0.500 0.800000
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.