Working through this: https://towardsdatascience.com/exploratory-statistical-data-analysis-with-a-real-dataset-using-pandas-208007798b92
A little shy of half way through, the author calculates the number of unique medal winners with this line of code:
medal_winners = len(df[df.Medal.fillna('None') != 'None'].Name.unique())
This seems rather unnecessarily complicated, so I am trying to simplify it.
Ultimately, I believe that line of code is saying: first check for non-null values in the 'Medal' column, then get the number of unique names who have won medals.
To me this is: check 'Medal' for a non-null value, then groupby name and get the number of unique names who have won a medal. The type of medal does not matter, so if John Doe won three different medals, I only count him once. All I want is the total number of unique medal winners.
I came up with this:
medal_winners = df['Medal'].notnull().groupby['Name'].nunique()
But I get this error: TypeError: 'method' object is not subscriptable
I have tried other variations on what I think should work, but every time I get an error.
I thought the above would work, but it doesn't.
I just figured it out, but even with groupby() the solution is still longer than I expected -- or maybe I should say I did not achieve what I thought would be increased simplification:
medal_winners = df[df['Medal'].notnull()].groupby('Name')['Name'].nunique().sum()
Both my groupby() based solution and the authors yield an answer of: 28202
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.