[英]Pandas percentage of total row within multiindex
I have a dataframe that looks as follows: 我有一个数据框,如下所示:
df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value'])
print df
Default Letter Color Value Value2
0 Foo A Green 10 20
1 Foo A Red 20 30
2 Foo A Total 50 60
3 Foo B Blue 5 10
4 Foo B Red 15 25
5 Foo B Total 40 100
6 Foo C Orange 25 8
7 Foo C Total 50 10
I need to find the percentage of the total row that each color makes up within each group 我需要找到每种颜色在每组中占总行的百分比
My first thought was to split them into separate indexes, and use .div, but in this case I have a multiindex (I know in my example the first all says Foo, but that's not how the real data looks - roll with it.) and I get the notImplemented Error. 我首先想到的是将它们拆分为单独的索引,并使用.div,但是在这种情况下,我有一个多索引(我在我的示例中首先说的是Foo,但这并不是真实数据的样子-随其滚动。)我收到notImplemented错误。
df_color = df[df['Color']!='Total'].set_index(['Default','Letter','Color'])
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
df_out = df_color.div(df_tot)
NotImplementedError Traceback (most recent call last)
<ipython-input-119-0caf0e2959a6> in <module>()
4 df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
5
----> 6 df_out = df_color.div(df_tot)
7 #df.set_index(['Default','Letter','Color'],inplace = True)...
Here is my desired output: 这是我想要的输出:
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])
print df_out
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])
EDIT note that there are actually multiple value columns - for simplicity I just show one here, but the solution needs to handle 50-100 numerical value columns. 编辑注意,实际上有多个值列-为简单起见,我仅在此处显示一个,但解决方案需要处理50-100个数值列。
You can do this with a groupby
. 您可以使用
groupby
进行此操作。 Checkout the tutorial on using groupby. 查看有关使用groupby 的教程 。
Note : this implementation assumes that the Total
entry for each color is the last one for that color (as in the example) but this is easily modifiable. 注意 :此实现假定每种颜色的
Total
条目是该颜色的最后一个条目(如示例中所示),但这很容易修改。
cols = [x for x in df.columns if x not in ['Default', 'Letter', 'Color']] # or df.columns[3:]
df.loc[:, cols] = df.groupby('Letter', group_keys=False).apply(lambda df: df[cols] / df[cols].iloc[-1])
df[~(df['Color'] == 'Total')]
returns 退货
Default Letter Color Value Value2
0 Foo A Green 0.200 0.333333
1 Foo A Red 0.400 0.500000
3 Foo B Blue 0.125 0.100000
4 Foo B Red 0.375 0.250000
6 Foo C Orange 0.500 0.800000
I ended up reformatting the datafames using the melt function so the column name became another column in the data. 我最终使用了melt函数重新格式化了数据帧,因此列名成为了数据中的另一列。 Then I could simply merge and divide, and reformat at the end
然后我可以简单地合并和划分,最后重新格式化
df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value','Value2'])
df_color = df[df['Color']!='Total']
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1)
df_melt = pd.melt(df_color,id_vars = ['Default','Letter', 'Color'],var_name =['value_field'] )
df_tot_melt = pd.melt(df_tot,id_vars = ['Default','Letter'],var_name =['value_field'], value_name = 'Total')
df_melt_pct = pd.merge(df_melt, df_tot_melt, how = 'outer', on = ['Default','Letter','value_field'])
df_melt_pct['Pct'] = df_melt_pct['value'] /df_melt_pct['Total']
df_melt_pct = df_melt_pct.drop(['value','Total'],axis = 1).set_index(['Default','Letter','Color','value_field']).unstack()
df_melt_pct.columns = df_melt_pct.columns.droplevel(0)
print df_melt_pct
value_field Value Value2
Default Letter Color
Foo A Green 0.200 0.333333
Red 0.400 0.500000
B Blue 0.125 0.100000
Red 0.375 0.250000
C Orange 0.500 0.800000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.