简体   繁体   中英

Calculate unique values in one column based upon non-null values in another

Working through this: https://towardsdatascience.com/exploratory-statistical-data-analysis-with-a-real-dataset-using-pandas-208007798b92

A little shy of half way through, the author calculates the number of unique medal winners with this line of code:

medal_winners = len(df[df.Medal.fillna('None') != 'None'].Name.unique())

This seems rather unnecessarily complicated, so I am trying to simplify it.

Ultimately, I believe that line of code is saying: first check for non-null values in the 'Medal' column, then get the number of unique names who have won medals.

To me this is: check 'Medal' for a non-null value, then groupby name and get the number of unique names who have won a medal. The type of medal does not matter, so if John Doe won three different medals, I only count him once. All I want is the total number of unique medal winners.

I came up with this:

medal_winners = df['Medal'].notnull().groupby['Name'].nunique()

But I get this error: TypeError: 'method' object is not subscriptable

I have tried other variations on what I think should work, but every time I get an error.

I thought the above would work, but it doesn't.

I just figured it out, but even with groupby() the solution is still longer than I expected -- or maybe I should say I did not achieve what I thought would be increased simplification:

medal_winners = df[df['Medal'].notnull()].groupby('Name')['Name'].nunique().sum()

Both my groupby() based solution and the authors yield an answer of: 28202

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM