How to operate on multilevel indexing in Pandas?

Question

These are the top 5 rows of my multilevel dataframe-

              column1    column2
              count    sum       max
 column1
2516491004  2   0.232758    0.232758
2510581003  1   0.405012    0.405012
2591381007  6   3.535806    0.932517
2595381003  31  15.421238   0.757979
2594481008  4   1.129524    0.389363

I want column2[sum]/column1[count] and column2[max]/column1[count] polpulated against every entry in column1 as my new dataframe. For example, the first row of my new dataframe should be -

 column1    sum_value  max_value
2516491004  0.116379    0.116379

I am new to Python and have searched a lot but could not find the correct way to iterate. Any help is much appreciated.

Answer 1

If you use a DataFrame with a MultiIndex on columns, the way you refer to a column is a tuple with values from each level of the (column) MultiIndex.

So one of possible solutions is to define the following function:

def fn(row):
    return pd.Series([
        row[('column2', 'sum')] / row[('column1', 'count')],
        row[('column2', 'max')] / row[('column1', 'count')]],
        index=['sum_value', 'max_value'])

and then to apply it:

df.apply(fn, axis=1)

The result is a new DataFrame with the index as before and 2 columns:

            sum_value  max_value
column1                         
2516491004   0.116379   0.116379
2510581003   0.405012   0.405012
2591381007   0.589301   0.155420
2595381003   0.497459   0.024451
2594481008   0.282381   0.097341

If you want to have column1 as a regular column, supplement the above instruction with .reset_index() .

Another, actually quicker solution is:

pd.DataFrame({ 'sum_value': df[('column2', 'sum')] / df[('column1', 'count')],
    'max_value': df[('column2', 'max')] / df[('column1', 'count')]})

How to operate on multilevel indexing in Pandas?

Question

1 answers

solution1
1 ACCPTED 2019-12-14 19:22:32

How to operate on multilevel indexing in Pandas?

Question

1 answers

solution1 1 ACCPTED 2019-12-14 19:22:32

solution1
1 ACCPTED 2019-12-14 19:22:32