Pandas percentage of total row within multiindex

Question

I have a dataframe that looks as follows:

df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value'])
print df

      Default Letter   Color  Value  Value2
0     Foo      A   Green     10      20
1     Foo      A     Red     20      30
2     Foo      A   Total     50      60
3     Foo      B    Blue      5      10
4     Foo      B     Red     15      25
5     Foo      B   Total     40     100
6     Foo      C  Orange     25       8
7     Foo      C   Total     50      10

I need to find the percentage of the total row that each color makes up within each group

My first thought was to split them into separate indexes, and use .div, but in this case I have a multiindex (I know in my example the first all says Foo, but that's not how the real data looks - roll with it.) and I get the notImplemented Error.

df_color = df[df['Color']!='Total'].set_index(['Default','Letter','Color'])
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])

df_out = df_color.div(df_tot)

NotImplementedError                       Traceback (most recent call last)
<ipython-input-119-0caf0e2959a6> in <module>()
      4 df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
      5 
----> 6 df_out = df_color.div(df_tot)
      7 #df.set_index(['Default','Letter','Color'],inplace = True)...

Here is my desired output:

df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])

print df_out
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])

EDIT note that there are actually multiple value columns - for simplicity I just show one here, but the solution needs to handle 50-100 numerical value columns.

Answer 1

You can do this with a groupby . Checkout the tutorial on using groupby.

Note : this implementation assumes that the Total entry for each color is the last one for that color (as in the example) but this is easily modifiable.

cols = [x for x in df.columns if x not  in ['Default', 'Letter', 'Color']]  # or df.columns[3:]
df.loc[:, cols] = df.groupby('Letter', group_keys=False).apply(lambda df: df[cols] / df[cols].iloc[-1])
df[~(df['Color'] == 'Total')]

returns

  Default Letter   Color  Value    Value2
0     Foo      A   Green  0.200  0.333333
1     Foo      A     Red  0.400  0.500000
3     Foo      B    Blue  0.125  0.100000
4     Foo      B     Red  0.375  0.250000
6     Foo      C  Orange  0.500  0.800000

Answer 2

I ended up reformatting the datafames using the melt function so the column name became another column in the data. Then I could simply merge and divide, and reformat at the end

df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value','Value2'])

df_color = df[df['Color']!='Total']
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1)

df_melt = pd.melt(df_color,id_vars = ['Default','Letter', 'Color'],var_name =['value_field'] )
df_tot_melt = pd.melt(df_tot,id_vars = ['Default','Letter'],var_name =['value_field'], value_name = 'Total')


df_melt_pct = pd.merge(df_melt, df_tot_melt, how = 'outer', on = ['Default','Letter','value_field'])
df_melt_pct['Pct'] = df_melt_pct['value'] /df_melt_pct['Total']
df_melt_pct = df_melt_pct.drop(['value','Total'],axis = 1).set_index(['Default','Letter','Color','value_field']).unstack()
df_melt_pct.columns = df_melt_pct.columns.droplevel(0)

print df_melt_pct

value_field            Value    Value2
Default Letter Color                  
Foo     A      Green   0.200  0.333333
               Red     0.400  0.500000
        B      Blue    0.125  0.100000
               Red     0.375  0.250000
        C      Orange  0.500  0.800000

Pandas percentage of total row within multiindex

Question

2 answers

solution1
0 2018-02-01 17:00:48

solution2
0 2018-02-01 18:00:36

Pandas percentage of total row within multiindex

Question

2 answers

solution1 0 2018-02-01 17:00:48

solution2 0 2018-02-01 18:00:36

solution1
0 2018-02-01 17:00:48

solution2
0 2018-02-01 18:00:36