[英]Find max and min value for several numeric column and return dataframe with the corresponding row value
I have the following dataset我有以下数据集
For each year column, I would like to find the max and min values and return both the 'max' and 'min' values together with the corresponding 'Geo' value for each.对于每一年的列,我想找到最大值和最小值,并返回“最大值”和“最小值”以及每个值对应的“地理”值。
For instance, for '1950', '1951', and so on, I would like to produce a dataframe like this one:例如,对于“1950”、“1951”等,我想生成如下所示的 dataframe:
This is a similar thread, but the suggested approaches there don't seem to work because my columns have numeric headers, plus my desired result is slightly different. 这是一个类似的线程,但那里建议的方法似乎不起作用,因为我的列有数字标题,而且我想要的结果略有不同。
Any advice would be helpful.任何意见将是有益的。 Thanks.谢谢。
This should work but it surely exists a better solution.这应该可行,但肯定存在更好的解决方案。 I supposed your initial dataframe was a pandas dataframe named df.我假设你最初的 dataframe 是一个名为 df 的 pandas dataframe。
dff = pd.DataFrame({'row_labels':['Max_value','Max_geo','Min_value','Min_geo']})
for col in df.columns[2:]: #start at column 1950
col_list = []
col_list.append(df[col].min())
col_list.append(df.loc[df[col] == df[col].min(),'Geo'].values[0])
col_list.append(df[col].max())
col_list.append(df.loc[df[col] == df[col].max(),'Geo'].values[0])
dff[col] = col_list
dff.set_index('row_labels', inplace = True, drop = True)
You can do this without having to loop or do any value comparisons to find the max, using max
, min
, idxmax
and idxmin
as follows (assuming your dataframe is df
):您可以使用max
、 min
、 idxmax
和idxmin
执行此操作,而无需循环或进行任何值比较来查找最大值,如下所示(假设您的 dataframe 是df
):
(df.melt(id_vars='Geo', var_name='year')
.set_index('geo')
.groupby('year')
.agg({'value': ('max', 'idxmax', 'min', 'idxmin')})
.T)
You can use df.set_index
with stack
and Groupby.agg
:您可以将df.set_index
与stack
和Groupby.agg
一起使用:
In [1915]: df = pd.DataFrame({'Geo':['Afghanistan', 'Albania', 'Algeria', 'Angola'], 'Geo code':[4,8,12,24], '1950':[27.638, 54.191, 42.087, 35.524], '1951':[27.878, 54.399, 42.282, 35.599]})
In [1914]: df
Out[1914]:
Geo Geo code 1950 1951
0 Afghanistan 4 27.638 27.878
1 Albania 8 54.191 54.399
2 Algeria 12 42.087 42.282
3 Angola 24 35.524 35.599
In [1916]: x = df.set_index('Geo').stack().reset_index(level=1, name='value').query('level_1 != "Geo code"')
In [1917]: res = x.groupby('level_1').agg({'value': ('max', 'idxmax', 'min', 'idxmin')}).T
In [1918]: res
Out[1918]:
level_1 1950 1951
value max 54.191 54.399
idxmax Albania Albania
min 27.638 27.878
idxmin Afghanistan Afghanistan
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.