These are the top 5 rows of my multilevel dataframe-
column1 column2
count sum max
column1
2516491004 2 0.232758 0.232758
2510581003 1 0.405012 0.405012
2591381007 6 3.535806 0.932517
2595381003 31 15.421238 0.757979
2594481008 4 1.129524 0.389363
I want column2[sum]/column1[count] and column2[max]/column1[count] polpulated against every entry in column1 as my new dataframe. For example, the first row of my new dataframe should be -
column1 sum_value max_value
2516491004 0.116379 0.116379
I am new to Python and have searched a lot but could not find the correct way to iterate. Any help is much appreciated.
If you use a DataFrame with a MultiIndex on columns, the way you refer to a column is a tuple with values from each level of the (column) MultiIndex.
So one of possible solutions is to define the following function:
def fn(row):
return pd.Series([
row[('column2', 'sum')] / row[('column1', 'count')],
row[('column2', 'max')] / row[('column1', 'count')]],
index=['sum_value', 'max_value'])
and then to apply it:
df.apply(fn, axis=1)
The result is a new DataFrame with the index as before and 2 columns:
sum_value max_value
column1
2516491004 0.116379 0.116379
2510581003 0.405012 0.405012
2591381007 0.589301 0.155420
2595381003 0.497459 0.024451
2594481008 0.282381 0.097341
If you want to have column1 as a regular column, supplement the above instruction with .reset_index()
.
Another, actually quicker solution is:
pd.DataFrame({ 'sum_value': df[('column2', 'sum')] / df[('column1', 'count')],
'max_value': df[('column2', 'max')] / df[('column1', 'count')]})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.