![](/img/trans.png)
[英]How to use python to group by two columns, sum them and use one of the columns to sort and get the n highest per group in pandas
[英]How to sum columns and then sort them in pandas python
下面是一個示例 DataFrame,數據在 .csv 文件中。
EPISODE_Number EPISODE_TITLE object1 object2 object3 object4 object5
0 S01E01 A 1 1 0 0 0
1 S01E02 B 0 0 0 1 0
2 S01E03 C 1 1 0 0 1
3 S01E04 D 0 1 1 1 0
4 S01E05 E 1 0 0 1 0
5 S01E06 F 1 1 0 1 1
6 S01E07 G 0 0 0 1 1
7 S01E08 H 0 1 0 0 0
8 S01E09 I 1 1 0 1 1
9 S01E10 J 0 1 1 0 0
我想得到每個對象的總和,然后將對象從大到小排序(僅限前 10 名)
以下是我到目前為止的代碼:
import pandas as pd
data = pd.read_csv("TV_show.csv")
sume_s = data[data.sum(0).sort_values(ascending=False)[2:6].index]
輸出應如下所示:
object2: 7
object4: 6
object1: 5
object5: 4
object3: 2
但我收到以下錯誤:
indexer = non_nan_idx[non_nans.argsort(kind=kind)]
TypeError: '>' not supported between instances of 'numpy.ndarray' and 'str'
將DataFrame.iloc
與 sum 一起使用,並為前 10 名添加Series.nlargest
:
sume_a = data.iloc[:, 2:7].sum().nlargest(10)
print (sume_a)
object2 7
object4 6
object1 5
object5 4
object3 2
dtype: int64
像評論解決方案一樣工作:
sume_a = data.iloc[:, 2:7].sum().sort_values(ascending=False).head(10)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.