简体   繁体   中英

Python: Pandas: Groupby & Pivot Tables are missing rows

I have a dataframe composed of individuals (their ID's in), activities, and corresponding scores. I'm trying to get the sum of the scores when grouping by the student and an activity type. I can do this with the following:

data_detail.pivot_table(["total_scored","total_scored_omitted"], index = ["id","activity"], aggfunc="sum")

data_detail.groupby(["id","activity"]).sum()

However, when I check the results by looking at a typical student:

data_detail[data_detail["id"]== 41824840].sort_values("activity")

I see that there are some activities listed for that given student which are missing from the groupby/pivot table. How can I ensure the final groupby/pivot table is complete and isn't missing any values?

The problem is that the data type for the scores wasn't consistent (and a float at that!).

Some of them were strings. After I converted all of the scores into floats, the missing activities showed up.

As an added benefit, having the datatypes be uniform, made the calculation much faster!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM