I have a python dataframe df with five columns and five rows. I would like to get the row and column name of the max three values
Example:
df =
A B C D E F
1 00 01 02 03 04 05
2 06 07 08 09 10 11
3 12 13 14 15 16 17
4 18 19 20 21 22 23
5 24 25 26 27 28 29
The output show say something like [5,F],[5,E],[5,D]
You could use unstack
before sorting:
>>> df
A B C D E F
1 0 1 2 3 4 5
2 6 7 8 9 10 11
3 12 13 14 15 16 17
4 18 19 20 21 22 23
5 24 25 26 27 28 29
>>> df.unstack()
A 1 0
2 6
3 12
4 18
5 24
B 1 1
2 7
3 13
4 19
5 25
[...]
F 1 5
2 11
3 17
4 23
5 29
and so
>>> df2 = df.unstack().copy()
>>> df2.sort()
>>> df2[-3:]
D 5 27
E 5 28
F 5 29
>>> df2[-3:].index
MultiIndex
[(D, 5.0), (E, 5.0), (F, 5.0)]
or even
>>> df.unstack()[df.unstack().argsort()].index[-3:]
MultiIndex
[(D, 5.0), (E, 5.0), (F, 5.0)]
[I didn't bother reversing the order: sticking [::-1]
at the end should do it.]
I am not going to pretend these are the most efficient way of dealing with this problem, but I though they are worth mentioning:
df
A B C D E F
1 0 1 2 3 4 5
2 6 7 8 9 10 11
3 12 13 14 15 16 17
4 18 19 20 21 22 23
5 24 25 26 27 28 29
Using df.max()
to get the maximum value of each column and then sorting values and getting the biggest numbers. Then masking them against the original df and returning the values. A list comprehension can is finally used to get the indices:
df_2 = df[df.max().sort_values(ascending=True).tail(3).eq(df)]
[(i, df_2[i].first_valid_index()) for i in df_2.columns if df_2[i].first_valid_index() != None]
Output:
[('D', 5), ('E', 5), ('F', 5)]
or
s = df_2.apply(pd.Series.first_valid_index).dropna()
list(zip(s.index, s.astype(int)))
Output:
[('D', 5), ('E', 5), ('F', 5)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.