简体   繁体   English

Python:熊猫:Groupby和数据透视表缺少行

[英]Python: Pandas: Groupby & Pivot Tables are missing rows

I have a dataframe composed of individuals (their ID's in), activities, and corresponding scores. 我有一个数据框,该数据框由个人(其ID位于),活动和相应的分数组成。 I'm trying to get the sum of the scores when grouping by the student and an activity type. 我正在尝试按学生和活动类型分组时获得分数的总和。 I can do this with the following: 我可以执行以下操作:

data_detail.pivot_table(["total_scored","total_scored_omitted"], index = ["id","activity"], aggfunc="sum")

data_detail.groupby(["id","activity"]).sum()

However, when I check the results by looking at a typical student: 但是,当我通过查看典型学生来检查结果时:

data_detail[data_detail["id"]== 41824840].sort_values("activity")

I see that there are some activities listed for that given student which are missing from the groupby/pivot table. 我看到有一些针对该给定学生的活动,这些活动在groupby / pivot表中丢失了。 How can I ensure the final groupby/pivot table is complete and isn't missing any values? 如何确保最终的groupby / pivot表完整且不丢失任何值?

The problem is that the data type for the scores wasn't consistent (and a float at that!). 问题是分数数据类型不一致(并且浮动!)。

Some of them were strings. 其中一些是琴弦。 After I converted all of the scores into floats, the missing activities showed up. 当我将所有乐谱转换为浮点数后,出现了缺少的活动。

As an added benefit, having the datatypes be uniform, made the calculation much faster! 另一个好处是,数据类型统一,可以使计算更快!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM