简体   繁体   中英

Find max and min value for several numeric column and return dataframe with the corresponding row value

I have the following dataset

在此处输入图像描述

For each year column, I would like to find the max and min values and return both the 'max' and 'min' values together with the corresponding 'Geo' value for each.

For instance, for '1950', '1951', and so on, I would like to produce a dataframe like this one:

在此处输入图像描述

This is a similar thread, but the suggested approaches there don't seem to work because my columns have numeric headers, plus my desired result is slightly different.

Any advice would be helpful. Thanks.

This should work but it surely exists a better solution. I supposed your initial dataframe was a pandas dataframe named df.

dff = pd.DataFrame({'row_labels':['Max_value','Max_geo','Min_value','Min_geo']})

for col in df.columns[2:]: #start at column 1950
    col_list = []
    col_list.append(df[col].min())
    col_list.append(df.loc[df[col] == df[col].min(),'Geo'].values[0])
    col_list.append(df[col].max())
    col_list.append(df.loc[df[col] == df[col].max(),'Geo'].values[0])

    dff[col] = col_list

dff.set_index('row_labels', inplace = True, drop = True)

    

You can do this without having to loop or do any value comparisons to find the max, using max , min , idxmax and idxmin as follows (assuming your dataframe is df ):

(df.melt(id_vars='Geo', var_name='year')
   .set_index('geo')
   .groupby('year')
   .agg({'value': ('max', 'idxmax', 'min', 'idxmin')})
   .T)

You can use df.set_index with stack and Groupby.agg :

In [1915]: df = pd.DataFrame({'Geo':['Afghanistan', 'Albania', 'Algeria', 'Angola'], 'Geo code':[4,8,12,24], '1950':[27.638, 54.191, 42.087, 35.524], '1951':[27.878, 54.399, 42.282, 35.599]})

In [1914]: df
Out[1914]: 
           Geo  Geo code    1950    1951
0  Afghanistan         4  27.638  27.878
1      Albania         8  54.191  54.399
2      Algeria        12  42.087  42.282
3       Angola        24  35.524  35.599

In [1916]: x = df.set_index('Geo').stack().reset_index(level=1, name='value').query('level_1 != "Geo code"')

In [1917]: res = x.groupby('level_1').agg({'value': ('max', 'idxmax', 'min', 'idxmin')}).T

In [1918]: res
Out[1918]: 
level_1              1950         1951
value max          54.191       54.399
      idxmax      Albania      Albania
      min          27.638       27.878
      idxmin  Afghanistan  Afghanistan

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM