Getting top 3 rows that have biggest sum of columns in `pandas.DataFrame`?

Question

Here is my pandas.DataFrame :

        day1   day2   day3
Apple     40     13     98
Orange    32     45     56
Banana    56     76     87
Pineapple 12     19     12
Grape     89     45     67

I want to create a new DataFrame that will contains top 3 fruits that have biggest sum of three days.

Sum of apple for three days -- 151 , orange -- 133 , banana -- 219 , Pineapple -- 43 , grape -- 201 .
So the top 3 fruits is: 1) banana ; 2) grape ; 3) apple .

Here is an expected output:

        day1   day2   day3
Banana    56     76     87
Grape     89     45     67
Apple     40     13     98

How can I do that with pandas.DataFrame ?

Thank you!

Answer 1

Here's how you get the indices for the top 3 days by sum:

In [1]: df.sum(axis=1).order(ascending=False).head(3)
Out[1]:
Banana    219
Grape     201
Apple     151

And you can use that index to reference your original datset:

In [2]: idx = df.sum(axis=1).order(ascending=False).head(3).index

In [3]: df.ix[idx]
Out[3]:
        day1  day2  day3
Banana    56    76    87
Grape     89    45    67
Apple     40    13    98

[EDIT]

order() is now deprecated. sort_values() can be used here.

df.sum(axis=1).sort_values(ascending=False).head(3)

Getting top 3 rows that have biggest sum of columns in `pandas.DataFrame`?

Question

1 answers

solution1
15 ACCPTED 2013-12-09 20:50:33

Getting top 3 rows that have biggest sum of columns in `pandas.DataFrame`?

Question

1 answers

solution1 15 ACCPTED 2013-12-09 20:50:33

solution1
15 ACCPTED 2013-12-09 20:50:33