[英]Sort dataframe by value returns "For a multi-index, the label must be a tuple with elements corresponding to each level."
Objective: Based off dataframe with 5 columns, return dataframe with 3 columns including one which is the count and sort by largest count to smallest. Objective: 基于 dataframe 的 5 列,返回 dataframe 的 3 列,其中一列是计数,并按从最大到最小的计数排序。
What I have tried:我试过的:
df = df[['Country', 'Year','NumInstances']].groupby(['Country', 'Year']).agg(['count'])
df = df.sort_values(by='NumInstances', ascending=False)
print(df)
Error: ValueError: The column label 'NumInstances' is not unique.错误:ValueError:列 label 'NumInstances' 不是唯一的。 For a multi-index, the label must be a tuple with elements corresponding to each level.
对于多索引,label 必须是一个元组,其元素对应于每个级别。
Before this gets mark as a duplicate, I have gone through all other suggested duplicates and it seems they all suggest using the same code as I have above.在这被标记为重复之前,我已经浏览了所有其他建议的重复,似乎它们都建议使用与我上面相同的代码。
Is there something small that I am doing that may be incorrect?我正在做的一些小事情可能不正确吗?
Thanks!谢谢!
I guess you need to remove multi-index -我想你需要删除多索引 -
Try this -尝试这个 -
df = df[['Country', 'Year','NumInstances']].groupby(['Country', 'Year']).agg(['count']).reset_index()
or -或者 -
df = df[['Country', 'Year','NumInstances']].groupby(['Country', 'Year'], as_index=False).agg(['count'])
Found the issue.发现问题。 Adding an agg to the NumInstances column made the NumInstances column name a tuple of ('NumInstances', 'sum'), therefore I just updated the sort code to:
向 NumInstances 列添加 agg 使 NumInstances 列名称成为 ('NumInstances', 'sum') 的元组,因此我刚刚将排序代码更新为:
df = df.sort_values(by=('NumInstances', 'sum'), ascending=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.